After an upgrade of my PC's mainboard BIOS the boot would take a minute or more to complete and sometimes the lightdm login screen would sit there but not accept keyboard input for another minute or so. Then the keyboard got enabled and I could log in normally. Everything worked fine after that bootup struggle completed. This was fully reproducible and persisted across reboots. Weird.
The kernel dmesg log showed entries that looked suspicious:
Googleing these error -110 and error -71 is a bit hard. Now why the USB driver does not give useful error messages instead of archaic errno-style numbers escapes me. This is not the 80s anymore.
The wisdom of the crowd says error -110 is something around "the USB port power supply was exceeded" [source].
Now lsusb -tv shows device 1-7 ... to be my USB keyboard. I somehow doubt that wants more power than the hub is willing to provide.
The Archlinux BBS Forums recommend to piece together information from drivers/usb/host/ohci.h and (updated from their piece which is from 2012) /tools/include/uapi/asm-generic/errno.h. This is why some people then consider -110 to mean "Connection timed out". Nah, not likely either.
Reading through the kernel source around drivers/usb/host did not enlighten me either. To the contrary. Uuugly. There seems to be no comprehensive list what these error codes mean. And the numbers are assigned to errors conditions quite arbitrarily. And - of course - there is no documentation. "It was hard to do, so it should be hard to understand as well."
Luckily some of the random musings I read through contained some curious advice: power cycle the host. So I did and that did not make the error go away. Other people insisted on removing cables out of wall sockets, unplugging everything and conducting esoteric rituals. That made it dawn on me, the mainboard of course nicely powers the USB in "off" state, too. So switching the power supply off (yes, these have a separate switch, go find yours), waiting a bit for capacitors to drain and switching things back on and ... the errors were gone, the system booted within seconds again.
So the takeaway message: If you get random error messages like
on devices that previously worked fine ... completely remove power from the host, the hubs and the USB devices. So they forget they saw each other on the bus before. And when they see each other after that blackout, they will happily go through negotiating protocol details with each other again successfully.
The LinuxCNC project is making headway these days. A lot of
patches and issues have seen activity on
the project github
pages recently. A few weeks ago there was a developer gathering
over at the Tormach headquarter in
Wisconsin, and now we are planning a new gathering in Norway. If you
wonder what LinuxCNC is, lets quote Wikipedia:
"LinuxCNC is a software system for numerical control of
machines such as milling machines, lathes, plasma cutters, routers,
cutting machines, robots and hexapods. It can control up to 9 axes or
joints of a CNC machine using G-code (RS-274NGC) as input. It has
several GUIs suited to specific kinds of usage (touch screen,
interactive development)."
The Norwegian developer gathering take place the weekend June 16th
to 18th this year, and is open for everyone interested in contributing
to LinuxCNC. Up to date information about the gathering can be found
in
the
developer mailing list thread where the gathering was announced.
Thanks to the good people at
Debian,
Redpill-Linpro and
NUUG Foundation, we
have enough sponsor funds to pay for food, and shelter for the people
traveling from afar to join us. If you would like to join the
gathering, get in touch.
As usual, if you use Bitcoin and want to show your support of my
activities, please send Bitcoin donations to my address
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
(Edit 2023-05-10: This has now launched for a subset of Twitter users. The code that existed to notify users that device identities had changed does not appear to have been enabled - as a result, in its current form, Twitter can absolutely MITM conversations and read your messages)
Elon Musk appeared on an interview with Tucker Carlson last month, with one of the topics being the fact that Twitter could be legally compelled to hand over users' direct messages to government agencies since they're held on Twitter's servers and aren't encrypted. Elon talked about how they were in the process of implementing proper encryption for DMs that would prevent this - "You could put a gun to my head and I couldn't tell you. That's how it should be."
tl;dr - in the current implementation, while Twitter could subvert the end-to-end nature of the encryption, it could not do so without users being notified. If any user involved in a conversation were to ignore that notification, all messages in that conversation (including ones sent in the past) could then be decrypted. This isn't ideal, but it still seems like an improvement over having no encryption at all. More technical discussion follows.
For context: all information about Twitter's implementation here has been derived from reverse engineering version 9.86.0 of the Android client and 9.56.1 of the iOS client (the current versions at time of writing), and the feature hasn't yet launched. While it's certainly possible that there could be major changes in the protocol between now launch, Elon has asserted that they plan to launch the feature this week so it's plausible that this reflects what'll ship.
For it to be impossible for Twitter to read DMs, they need to not only be encrypted, they need to be encrypted with a key that's not available to Twitter. This is what's referred to as "end-to-end encryption", or e2ee - it means that the only components in the communication chain that have access to the unencrypted data are the endpoints. Even if the message passes through other systems (and even if it's stored on other systems), those systems do not have access to the keys that would be needed to decrypt the data.
End-to-end encrypted messengers were initially popularised by Signal, but the Signal protocol has since been incorporated into WhatsApp and is probably much more widely used there. Millions of people per day are sending messages to each other that pass through servers controlled by third parties, but those third parties are completely unable to read the contents of those messages. This is the scenario that Elon described, where there's no degree of compulsion that could cause the people relaying messages to and from people to decrypt those messages afterwards.
But for this to be possible, both ends of the communication need to be able to encrypt messages in a way the other end can decrypt. This is usually performed using AES, a well-studied encryption algorithm with no known significant weaknesses. AES is a form of what's referred to as a symmetric encryption, one where encryption and decryption are performed with the same key. This means that both ends need access to that key, which presents us with a bootstrapping problem. Until a shared secret is obtained, there's no way to communicate securely, so how do we generate that shared secret? A common mechanism for this is something called Diffie Hellman key exchange, which makes use of asymmetric encryption. In asymmetric encryption, an encryption key can be split into two components - a public key and a private key. Both devices involved in the communication combine their private key and the other party's public key to generate a secret that can only be decoded with access to the private key. As long as you know the other party's public key, you can now securely generate a shared secret with them. Even a third party with access to all the public keys won't be able to identify this secret. Signal makes use of a variation of Diffie-Hellman called Extended Triple Diffie-Hellman that has some desirable properties, but it's not strictly necessary for the implementation of something that's end-to-end encrypted.
Although it was rumoured that Twitter would make use of the Signal protocol, and in fact there are vestiges of code in the Twitter client that still reference Signal, recent versions of the app have shipped with an entirely different approach that appears to have been written from scratch. It seems simple enough. Each device generates an asymmetric keypair using the NIST P-256 elliptic curve, along with a device identifier. The device identifier and the public half of the key are uploaded to Twitter using a new API endpoint called /1.1/keyregistry/register. When you want to send an encrypted DM to someone, the app calls /1.1/keyregistry/extract_public_keys with the IDs of the users you want to communicate with, and gets back a list of their public keys. It then looks up the conversation ID (a numeric identifier that corresponds to a given DM exchange - for a 1:1 conversation between two people it doesn't appear that this ever changes, so if you DMed an account 5 years ago and then DM them again now from the same account, the conversation ID will be the same) in a local database to retrieve a conversation key. If that key doesn't exist yet, the sender generates a random one. The message is then encrypted with the conversation key using AES in GCM mode, and the conversation key is then put through Diffie-Hellman with each of the recipients' public device keys. The encrypted message is then sent to Twitter along with the list of encrypted conversation keys. When each of the recipients' devices receives the message it checks whether it already has a copy of the conversation key, and if not performs its half of the Diffie-Hellman negotiation to decrypt the encrypted conversation key. One it has the conversation key it decrypts it and shows it to the user.
What would happen if Twitter changed the registered public key associated with a device to one where they held the private key, or added an entirely new device to a user's account? If the app were to just happily send a message with the conversation key encrypted with that new key, Twitter would be able to decrypt that and obtain the conversation key. Since the conversation key is tied to the conversation, not any given pair of devices, obtaining the conversation key means you can then decrypt every message in that conversation, including ones sent before the key was obtained.
(An aside: Signal and WhatsApp make use of a protocol called Sesame which involves additional secret material that's shared between every device a user owns, hence why you have to do that QR code dance whenever you add a new device to your account. I'm grossly over-simplifying how clever the Signal approach is here, largely because I don't understand the details of it myself. The Signal protocol uses something called the Double Ratchet Algorithm to implement the actual message encryption keys in such a way that even if someone were able to successfully impersonate a device they'd only be able to decrypt messages sent after that point even if they had encrypted copies of every previous message in the conversation)
How's this avoided? Based on the UI that exists in the iOS version of the app, in a fairly straightforward way - each user can only have a single device that supports encrypted messages. If the user (or, in our hypothetical, a malicious Twitter) replaces the device key, the client will generate a notification. If the user pays attention to that notification and verifies with the recipient through some out of band mechanism that the device has actually been replaced, then everything is fine. But, if any participant in the conversation ignores this warning, the holder of the subverted key can obtain the conversation key and decrypt the entire history of the conversation. That's strictly worse than anything based on Signal, where such impersonation would simply not work, but even in the Twitter case it's not possible for someone to silently subvert the security.
So when Elon says Twitter wouldn't be able to decrypt these messages even if someone held a gun to his head, there's a condition applied to that - it's true as long as nobody fucks up. This is clearly better than the messages just not being encrypted at all in the first place, but overall it's a weaker solution than Signal. If you're currently using Twitter DMs, should you turn on encryption? As long as the limitations aren't too limiting, definitely! Should you use this in preference to Signal or WhatsApp? Almost certainly not. This seems like a genuine incremental improvement, but it'd be easy to interpret what Elon says as providing stronger guarantees than actually exist.
Let s reflect on some of my recent work that started with understanding Trisquel GNU/Linux, improving transparency into apt-archives, working on reproducible builds of Trisquel, strengthening verification of apt-archives with Sigstore, and finally thinking about security device threat models. A theme in all this is improving methods to have trust in machines, or generally any external entity. While I believe that everything starts by trusting something, usually something familiar and well-known, we need to deal with misuse of that trust that leads to failure to deliver what is desired and expected from the trusted entity. How can an entity behave to invite trust? Let s argue for some properties that can be quantitatively measured, with a focus on computer software and hardware:
Deterministic Behavior given a set of circumstances, it should behave the same.
Verifiability and Transparency the method (the source code) should be accessible for understanding (compare scientific method) and its binaries verifiable, i.e., it should be possible to verify that the entity actually follows the intended deterministic method (implying efforts like reproducible builds and bootstrappable builds).
Accountable the entity should behave the same for everyone, and deviation should be possible prove in a way that is hard to deny, implying efforts such as Certificate Transparency and more generic checksum logs like Sigstore and Sigsum.
Liberating the tools and documentation should be available as free software to enable you to replace the trusted entity if so desired. An entity that wants to restrict you from being able to replace the trusted entity is vulnerable to corruption and may stop acting trustworthy. This point of view reinforces that open source misses the point; it has become too common to use trademark laws to restrict re-use of open source software (e.g., firefox, chrome, rust).
Essentially, this boils down to: Trust, Verify and Hold Accountable. To put this dogma in perspective, it helps to understand that this approach may be harmful to human relationships (which could explain the social awkwardness of hackers), but it remains useful as a method to improve the design of computer systems, and a useful method to evaluate safety of computer systems. When a system fails some of the criteria above, we know we have more work to do to improve it.
How far have we come on this journey? Through earlier efforts, we are in a fairly good situation. Richard Stallman through GNU/FSF made us aware of the importance of free software, the Reproducible/Bootstrappable build projects made us aware of the importance of verifiability, and Certificate Transparency highlighted the need for accountable signature logs leading to efforts like Sigstore for software. None of these efforts would have seen the light of day unless people wrote free software and packaged them into distributions that we can use, and built hardware that we can run it on. While there certainly exists more work to be done on the software side, with the recent amazing full-source build of Guix based on a 357-byte hand-written seed, I believe that we are closing that loop on the software engineering side.
So what remains? Some inspiration for further work:
Accountable binary software distribution remains unresolved in practice, although we have some software components around (e.g., apt-sigstore and guix git authenticate). What is missing is using them for verification by default and/or to improve the signature process to use trustworthy hardware devices, and committing the signatures to transparency logs.
Trustworthy hardware to run trustworthy software on remains a challenge, and we owe FSF s Respect Your Freedom credit for raising awareness of this. Many modern devices requires non-free software to work which fails most of the criteria above and are thus inherently untrustworthy.
Verifying rebuilds of currently published binaries on trustworthy hardware is unresolved.
Completing a full-source rebuild from a small seed on trustworthy hardware remains, preferably on a platform wildly different than X86 such as Raptor s Talos II.
We need improved security hardware devices and improved established practices on how to use them. For example, while Gnuk on the FST enable a trustworthy software and hardware solution, the best process for using it that I can think of generate the cryptographic keys on a more complex device. Efforts like Tillitis are inspiring here.
Onwards and upwards, happy hacking!
Update 2023-05-03: Added the Liberating property regarding free software, instead of having it be part of the Verifiability and Transparency .
Gray Mountain John Grisham
I have been perusing John Grisham s books, some read and some re-read again. Almost all of the books that Mr. Grisham wrote are relevant even today. The Gray Mountain talks about how mountain top removal was done in Applachia, the U.S. (South). In fact NASA made a summary which either was borrowed from this book or the author borrowed from NASA, either could be true although seems it might be the former. And this is when GOI just made a new Forest Conservation Bill 2023 which does the opposite. There are many examples of the same, the latest from my own city as an e.g. Vetal Tekdi is and was a haven for people animals, birds all kinds of ecosystem and is a vital lung of the city, one of the few remaining green spaces in the city but BJP wants to commercialize it as it has been doing for everything, I haven t been to the Himalayas since 4-5 years back as I hear the rape of the land daily. Even after Joshimath tragedy, if people are not opening the eyes what can I do The more I say, the more depressed I will become so will leave it for now. In many ways the destruction seems similar to the destruction that happened in Brazil under Jair Bolsonaro. So as can be seen from what I have shared this book was mostly about environment and punitive damages and also how punitive damages have been ceiling in America (South). This was done via lobbying by the coal groups and apparently destroyed people s lives, communities, even whole villages. It also shared how most people called black lung and how those claims had been denied by Coal Companies all the time. And there are hardly any unions. While the book itself is a fiction piece, there is a large amount of truth in it. And that is one of the reasons people write a fiction book. You could write about reality using fictional names and nobody can sue you while you set the reality as it is. In many ways, it is a tell-all.
The Testament John Grisham
One of the things I have loved about John Grisham is he understands human condition. In this book it starts with an eccentric billionaire who makes a will which leaves all his property to an illegitimate child who coincidentally also lives in Brazil, she is a missionary. The whole book is about human failings as well as about finding the heir. I am not going to give much detail as the book itself is fun.
The Appeal John Grisham
In this, we are introduced to a company that does a chemical spill for decades and how that leads to lobbying and funding Judicial elections. It does go into quite a bit of length how private money coming from Big business does all kind of shady things to get their person elected to the Supreme Court. Sadly, this seems to happen all the time, for e.g. two weeks ago. This piece from the Atlantic also says the same. Again, won t tell as there is a bit of irony at the end of the book.
The Rainmaker John Grisham
This in short is about how Insurance Companies stiff people. It s a wonderful story that has all people in grey including our hero. An engaging book that sorta tells how the Insurance Industry plays the game. In India, the companies are safe as they ask for continuance for years together and their object is to delay the hearings till the grandchildren are not alive unlike in the U.S. or UK. It also does tell how more lawyers are there then required. Both of which are the same in my country.
The Litigators John Grisham
This one is about Product Liability, both medicines as well as toys for young kids. What I do find funny a bit is how the law in States allows people to file cases but doesn t protect people while in EU they try their best that if there is any controversy behind any medicine or product, they will simply ban it. Huge difference between the two cultures.
Evolution no longer part of Indian Education
A few days ago, NCERT (one of the major bodies) that looks into Indian Education due to BJP influence has removed Darwin s Theory of Evolution. You can t even debate because the people do not understand adaptability. So the only conclusion is that Man suddenly appeared out of nowhere. If that is so, then we are the true aliens. They discard the notion that we and Chimpanzees are similar. Then they also have to discard this finding that genetically we are 96% similar.
I d like to describe and discuss a threat model for computational devices. This is generic but we will narrow it down to security-related devices. For example, portable hardware dongles used for OpenPGP/OpenSSH keys, FIDO/U2F, OATH HOTP/TOTP, PIV, payment cards, wallets etc and more permanently attached devices like a Hardware Security Module (HSM), a TPM-chip, or the hybrid variant of a mostly permanently-inserted but removable hardware security dongles.
Our context is cryptographic hardware engineering, and the purpose of the threat model is to serve as as a thought experiment for how to build and design security devices that offer better protection. The threat model is related to the Evil maid attack.
Our focus is to improve security for the end-user rather than the traditional focus to improve security for the organization that provides the token to the end-user, or to improve security for the site that the end-user is authenticating to. This is a critical but often under-appreciated distinction, and leads to surprising recommendations related to onboard key generation, randomness etc below.
The Substitution Attack
Your takeaway should be that devices should be designed to mitigate harmful consequences if any component of the device (hardware or software) is substituted for a malicious component for some period of time, at any time, during the lifespan of that component. Some designs protect better against this attack than other designs, and the threat model can be used to understand which designs are really bad, and which are less so.
Terminology
The threat model involves at least one device that is well-behaving and one that is not, and we call these Good Device and Bad Device respectively. The bad device may be the same physical device as the good key, but with some minor software modification or a minor component replaced, but could also be a completely separate physical device. We don t care about that distinction, we just care if a particular device has a malicious component in it or not. I ll use terms like security device , device , hardware key , security co-processor etc interchangeably.
From an engineering point of view, malicious here includes unintentional behavior such as software or hardware bugs. It is not possible to differentiate an intentionally malicious device from a well-designed device with a critical bug.
Don t attribute to malice what can be adequately explained by stupidity, but don t na vely attribute to stupidity what may be deniable malicious.
What is some period of time ?
Some period of time can be any length of time: seconds, minutes, days, weeks, etc.
It may also occur at any time: During manufacturing, during transportation to the user, after first usage by the user, or after a couple of months usage by the user. Note that we intentionally consider time-of-manufacturing as a vulnerable phase.
Even further, the substitution may occur multiple times. So the Good Key may be replaced with a Bad Key by the attacker for one day, then returned, and later this repeats a month later.
What is harmful consequences ?
Since a security key has a fairly well-confined scope and purpose, we can get a fairly good exhaustive list of things that could go wrong. Harmful consequences include:
Attacker learns any secret keys stored on a Good Key.
Attacker causes user to trust a public generated by a Bad Key.
Attacker is able to sign something using a Good Key.
Attacker learns the PIN code used to unlock a Good Key.
Attacker learns data that is decrypted by a Good Key.
Thin vs Deep solutions
One approach to mitigate many issues arising from device substitution is to have the host (or remote site) require that the device prove that it is the intended unique device before it continues to talk to it. This require an authentication/authorization protocol, which usually involves unique device identity and out-of-band trust anchors. Such trust anchors is often problematic, since a common use-case for security device is to connect it to a host that has never seen the device before.
A weaker approach is to have the device prove that it merely belongs to a class of genuine devices from a trusted manufacturer, usually by providing a signature generated by a device-specific private key signed by the device manufacturer. This is weaker since then the user cannot differentiate two different good devices.
In both cases, the host (or remote site) would stop talking to the device if it cannot prove that it is the intended key, or at least belongs to a class of known trusted genuine devices.
Upon scrutiny, this solution is still vulnerable to a substitution attack, just earlier in the manufacturing chain: how can the process that injects the per-device or per-class identities/secrets know that it is putting them into a good key rather than a malicious device? Consider also the consequences if the cryptographic keys that guarantee that a device is genuine leaks.
The model of the thin solution is similar to the old approach to network firewalls: have a filtering firewall that only lets through intended traffic, and then run completely insecure protocols internally such as telnet.
The networking world has evolved, and now we have defense in depth: even within strongly firewall ed networks, it is prudent to run for example SSH with publickey-based user authentication even on locally physical trusted networks. This approach requires more thought and adds complexity, since each level has to provide some security checking.
I m arguing we need similar defense-in-depth for security devices. Security key designs cannot simply dodge this problem by assuming it is working in a friendly environment where component substitution never occur.
Example: Device authentication using PIN codes
To see how this threat model can be applied to reason about security key designs, let s consider a common design.
Many security keys uses PIN codes to unlock private key operations, for example on OpenPGP cards that lacks built-in PIN-entry functionality. The software on the computer just sends a PIN code to the device, and the device allows private-key operations if the PIN code was correct.
Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious device that saves a copy of the PIN code presented to it, and then gives out error messages. Once the user has entered the PIN code and gotten an error message, presumably temporarily giving up and doing other things, the attacker replaces the device back again. The attacker has learnt the PIN code, and can later use this to perform private-key operations on the good device.
This means a good design involves not sending PIN codes in clear, but use a stronger authentication protocol that allows the card to know that the PIN was correct without learning the PIN. This is implemented optionally for many OpenPGP cards today as the key-derivation-function extension. That should be mandatory, and users should not use setups that sends device authentication in the clear, and ultimately security devices should not even include support for that. Compare how I build Gnuk on my PGP card with the kdf_do=required option.
Example: Onboard non-predictable key-generation
Many devices offer both onboard key-generation, for example OpenPGP cards that generate a Ed25519 key internally on the devices, or externally where the device imports an externally generated cryptographic key.
Let s apply the subsitution threat model to this design: the user wishes to generate a key and trust the public key that came out of that process. The attacker substitutes the device for a malicious device during key-generation, imports the private key into a good device and gives that back to the user. Most of the time except during key generation the user uses a good device but still the attacker succeeded in having the user trust a public key which the attacker knows the private key for. The substitution may be a software modification, and the method to leak the private key to the attacker may be out-of-band signalling.
This means a good design never generates key on-board, but imports them from a user-controllable environment. That approach should be mandatory, and users should not use setups that generates private keys on-board, and ultimately security devices should not even include support for that.
Example: Non-predictable randomness-generation
Many devices claims to generate random data, often with elaborate design documents explaining how good the randomness is.
Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious design that generates (for the attacker) predictable randomness. The user will never be able to detect the difference since the random output is, well, random, and typically not distinguishable from weak randomness. The user cannot know if any cryptographic keys generated by a generator was faulty or not.
This means a good design never generates non-predictable randomness on the device. That approach should be mandatory, and users should not use setups that generates non-predictable randomness on the device, and ideally devices should not have this functionality.
Case-Study: Tillitis
I have warmed up a bit for this. Tillitis is a new security device with interesting properties, and core to its operation is the Compound Device Identifier (CDI), essentially your Ed25519 private key (used for SSH etc) is derived from the CDI that is computed like this:
Let s apply the substitution threat model to this design: Consider someone replacing the Tillitis key with a malicious key during postal delivery of the key to the user, and the replacement device is identical with the real Tillitis key but implements the following key derivation function:
Where weakprng is a compromised algorithm that is predictable for the attacker, but still appears random. Everything will work correctly, but the attacker will be able to learn the secrets used by the user, and the user will typically not be able to tell the difference since the CDI is secret and the Ed25519 public key is not self-verifiable.
Conclusion
Remember that it is impossible to fully protect against this attack, that s why it is merely a thought experiment, intended to be used during design of these devices. Consider an attacker that never gives you access to a good key and as a user you only ever use a malicious device. There is no way to have good security in this situation. This is not hypothetical, many well-funded organizations do what they can to derive people from having access to trustworthy security devices. Philosophically it does not seem possible to tell if these organizations have succeeded 100% already and there are only bad security devices around where further resistance is futile, but to end on an optimistic note let s assume that there is a non-negligible chance that they haven t succeeded. In these situations, this threat model becomes useful to improve the situation by identifying less good designs, and that s why the design mantra of mitigate harmful consequences is crucial as a takeaway from this. Let s improve the design of security devices that further the security of its users!
This post serves as a report from my attendance to Kubecon and CloudNativeCon 2023 Europe that took place in
Amsterdam in April 2023. It was my second time physically attending this conference, the first one was in
Austin, Texas (USA) in 2017. I also attended once in a virtual fashion.
The content here is mostly generated for the sake of my own recollection and learnings, and is written from
the notes I took during the event.
The very first session was the opening keynote, which reunited the whole crowd to bootstrap the event and
share the excitement about the days ahead. Some astonishing numbers were announced: there were more than
10.000 people attending, and apparently it could confidently be said that it was the largest open source
technology conference taking place in Europe in recent times.
It was also communicated that the next couple iteration of the event will be run in China in September 2023
and Paris in March 2024.
More numbers, the CNCF was hosting about 159 projects, involving 1300 maintainers and about 200.000
contributors. The cloud-native community is ever-increasing, and there seems to be a strong trend in the
industry for cloud-native technology adoption and all-things related to PaaS and IaaS.
The event program had different tracks, and in each one there was an interesting mix of low-level and higher
level talks for a variety of audience. On many occasions I found that reading the talk title alone was not
enough to know in advance if a talk was a 101 kind of thing or for experienced engineers. But unlike in
previous editions, I didn t have the feeling that the purpose of the conference was to try selling me
anything. Obviously, speakers would make sure to mention, or highlight in a subtle way, the involvement of a
given company in a given solution or piece of the ecosystem. But it was non-invasive and fair enough for me.
On a different note, I found the breakout rooms to be often small. I think there were only a couple of rooms
that could accommodate more than 500 people, which is a fairly small allowance for 10k attendees. I realized
with frustration that the more interesting talks were immediately fully booked, with people waiting in line
some 45 minutes before the session time. Because of this, I missed a few important sessions that I ll
hopefully watch online later.
Finally, on a more technical side, I ve learned many things, that instead of grouping by session I ll group
by topic, given how some subjects were mentioned in several talks.
On gitops and CI/CD pipelines
Most of the mentions went to FluxCD and ArgoCD. At
that point there were no doubts that gitops was a mature approach and both flux and argoCD could do an
excellent job. ArgoCD seemed a bit more over-engineered to be a more general purpose CD pipeline, and flux
felt a bit more tailored for simpler gitops setups. I discovered that both have nice web user interfaces that
I wasn t previously familiar with.
However, in two different talks I got the impression that the initial setup of them was simple, but migrating
your current workflow to gitops could result in a bumpy ride. This is, the challenge is not deploying
flux/argo itself, but moving everything into a state that both humans and flux/argo can understand. I also
saw some curious mentions to the config drifts that can happen in some cases, even if the goal of gitops is
precisely for that to never happen. Such mentions were usually accompanied by some hints on how to operate
the situation by hand.
Worth mentioning, I missed any practical information about one of the key pieces to this whole gitops story:
building container images. Most of the showcased scenarios were using pre-built container images, so in that
sense they were simple. Building and pushing to an image registry is one of the two key points we would need
to solve in Toolforge Kubernetes if adopting gitops.
In general, even if gitops were already in our radar for
Toolforge Kubernetes,
I think it climbed a few steps in my priority list after the conference.
Another learning was this site: https://opengitops.dev/.
On etcd, performance and resource management
I attended a talk focused on etcd performance tuning that was very encouraging. They were basically talking
about the exactsameproblems we
have had in Toolforge Kubernetes, like api-server and etcd failure modes, and how sensitive etcd is to disk
latency, IO pressure and network throughput. Even though
Toolforge Kubernetes scale is small compared to other Kubernetes deployments out there, I found it very
interesting to see other s approaches to the same set of challenges.
I learned how most Kubernetes components and apps can overload the api-server. Because even the api-server
talks to itself. Simple things like kubectl may have a completely different impact on the API depending on
usage, for example when listing the whole list of objects (very expensive) vs a single object.
The conclusion was to try avoiding hitting the api-server with LIST calls, and use ResourceVersion which
avoids full-dumps from etcd (which, by the way, is the default when using bare kubectl get calls). I
already knew some of this, and for example the jobs-framework-emailer was already making use of this
ResourceVersion functionality.
There have been a lot of improvements in the performance side of Kubernetes in recent times, or more
specifically, in how resources are managed and used by the system. I saw a review of resource management from
the perspective of the container runtime and kubelet, and plans to support fancy things like topology-aware
scheduling decisions and dynamic resource claims (changing the pod resource claims without
re-defining/re-starting the pods).
On cluster management, bootstrapping and multi-tenancy
I attended a couple of talks that mentioned kubeadm, and one in particular was from the maintainers
themselves. This was of interest to me because as of today we use it for
Toolforge. They shared all
the latest developments and improvements, and the plans and roadmap for the future, with a special mention to
something they called kubeadm operator , apparently capable of auto-upgrading the cluster, auto-renewing
certificates and such.
I also saw a comparison between the different cluster bootstrappers, which to me confirmed that kubeadm was
the best, from the point of view of being a well established and well-known workflow, plus having a very
active contributor base. The kubeadm developers invited the audience to submit feature requests,
so I did.
The different talks confirmed that the basic unit for multi-tenancy in kubernetes is the namespace. Any
serious multi-tenant usage should leverage this. There were some ongoing conversations, in official sessions
and in the hallway, about the right tool to implement K8s-whitin-K8s, and vcluster
was mentioned enough times for me to be convinced it was the right candidate. This was despite of my impression
that multiclusters / multicloud are regarded as hard topics in the general community. I definitely would like to play
with it sometime down the road.
On networking
I attended a couple of basic sessions that served really well to understand how Kubernetes instrumented the
network to achieve its goal. The conference program had sessions to cover topics ranging from network
debugging recommendations, CNI implementations, to IPv6 support. Also, one of the keynote sessions had a
reference to how kube-proxy is not able to perform NAT for SIP connections, which is interesting because I
believe Netfilter Conntrack could do it if properly configured. One of the conclusions on the CNI front was
that Calico has a massive community adoption (in Netfilter mode), which is reassuring, especially considering
it is the one we use for Toolforge Kubernetes.
On jobs
I attended a couple of talks that were related to HPC/grid-like usages of Kubernetes. I was truly impressed
by some folks out there who were using Kubernetes Jobs on massive scales, such as to train machine learning
models and other fancy AI projects.
It is acknowledged in the community that the early implementation of things like Jobs and CronJobs had some
limitations that are now gone, or at least greatly improved. Some new functionalities have been added as
well. Indexed Jobs, for example, enables each Job to have a number (index) and process a chunk of a larger
batch of data based on that index. It would allow for full grid-like features like sequential (or again,
indexed) processing, coordination between Job and more graceful Job restarts. My first reaction was: Is that
something we would like to enable in Toolforge Jobs Framework?
On policy and security
A surprisingly good amount of sessions covered interesting topics related to policy and security. It was nice
to learn two realities:
kubernetes is capable of doing pretty much anything security-wise and create
greatly secured environments.
it does not by default. The defaults are not security-strict on purpose.
It kind of made sense to me: Kubernetes was used for a wide range of use cases, and developers didn t know
beforehand to which particular setup they should accommodate the default security levels.
One session in particular covered the most basic security features that should be enabled for any Kubernetes
system that would get exposed to random end users. In my opinion, the Toolforge Kubernetes setup was already
doing a good job in that regard. To my joy, some sessions referred to the Pod Security Admission mechanism,
which is one of the key security features we re about to adopt (when migrating away from
Pod Security Policy).
I also learned a bit more about Secret resources, their current implementation and how to leverage a
combo of CSI and RBAC for a more secure usage of external secrets.
Finally, one of the major takeaways from the conference was learning about kyverno and
kubeaudit. I was previously aware of the
OPA Gatekeeper. From the several demos I saw, it
was to me that kyverno should help us make Toolforge Kubernetes more sustainable by replacing all of our
custom admission controllers
with it. I already opened a ticket to track this idea, which I ll be
proposing to my team soon.
Final notes
In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had
forgotten because of lack of daily exposure. I m really happy that the cloud native way of thinking was
reinforced in me, which I still need because most of my muscle memory to approach systems architecture and
engineering is from the old pre-cloud days.
List of sessions I attended on the first day:
Do you want your apt-get update to only ever use files whose hash checksum have been recorded in the globally immutable tamper-resistance ledger rekor provided by the Sigstore project? Well I thought you d never ask, but now you can, thanks to my new projects apt-verify and apt-sigstore. I have not done proper stable releases yet, so this is work in progress. To try it out, adapt to the modern era of running random stuff from the Internet as root, and run the following commands. Use a container or virtual machine if you have trust issues.
If the stars are aligned (and the puppet projects of debdistget and debdistcanary have ran their GitLab CI/CD pipeline recently enough) you will see a successful output from apt-get update and your syslog will contain debug logs showing the entries from the rekor log for the release index files that you downloaded. See sample outputs in the README.
If you get tired of it, disabling is easy:
chmod -x /etc/apt/verify.d/apt-rekor
Our project currently supports Trisquel GNU/Linux 10 (nabia) & 11 (aramo), PureOS 10 (byzantium), Gnuinos chimaera, Ubuntu 20.04 (focal) & 22.04 (jammy), Debian 10 (buster) & 11 (bullseye), and Devuan GNU+Linux 4.0 (chimaera). Others can be supported to, please open an issue about it, although my focus is on FSDG-compliant distributions and their upstreams.
This is a continuation of my previous work on apt-canary. I have realized that it was better to separate out the generic part of apt-canary into my new project apt-verify that offers a plugin-based method, and then rewrote apt-canary to be one such plugin. Then apt-sigstore s apt-rekor was my second plugin for apt-verify.
Due to the design of things, and some current limitations, Ubuntu is the least stable since they push out new signed InRelease files frequently (mostly due to their use of Phased-Update-Percentage) and debdistget and debdistcanary CI/CD runs have a hard time keeping up. If you have insight on how to improve this, please comment me in the issue tracking the race condition.
There are limitations of what additional safety a rekor-based solution actually provides, but I expect that to improve as I get a cosign-based approach up and running. Currently apt-rekor mostly make targeted attacks less deniable. With a cosign-based approach, we could design things such that your machine only downloads updates when they have been publicly archived in an immutable fashion, or submitted for validation by a third-party such as my reproducible build setup for Trisquel GNU/Linux aramo.
What do you think? Happy Hacking!
Probably everyone is familiar with a regular VPN. The traditional use case is to connect to a corporate or home network from a remote location, and access services as if you were there.
But these days, the notion of corporate network and home network are less based around physical location. For instance, a company may have no particular office at all, may have a number of offices plus a number of people working remotely, and so forth. A home network might have, say, a PVR and file server, while highly portable devices such as laptops, tablets, and phones may want to talk to each other regardless of location. For instance, a family member might be traveling with a laptop, another at a coffee shop, and those two devices might want to communicate, in addition to talking to the devices at home.
And, in both scenarios, there might be questions about giving limited access to friends. Perhaps you d like to give a friend access to part of your file server, or as a company, you might have contractors working on a limited project.
Pretty soon you wind up with a mess of VPNs, forwarded ports, and tricks to make it all work. With the increasing prevalence of CGNAT, a lot of times you can t even open a port to the public Internet. Each application or device probably has its own gateway just to make it visible on the Internet, some of which you pay for.
Then you add on the question of: should you really trust your LAN anyhow? With possibilities of guests using it, rogue access points, etc., the answer is probably no .
We can move the responsibility for dealing with NAT, fluctuating IPs, encryption, and authentication, from the application layer further down into the network stack. We then arrive at a much simpler picture for all.
So this page is fundamentally about making the network work, simply and effectively.
How do we make the Internet work in these scenarios?
We re going to combine three concepts:
A VPN, providing fully encrypted and authenticated communication and stable IPs
Mesh Networking, in which devices automatically discover optimal paths to reach each other
Zero-trust networking, in which we do not need to trust anything about the underlying LAN, because all our traffic uses the secure systems in points 1 and 2.
By combining these concepts, we arrive at some nice results:
You can ssh hostname, where hostname is one of your machines (server, laptop, whatever), and as long as hostname is up, you can reach it, wherever it is, wherever you are.
Combined with mosh, these sessions will be durable even across moving to other host networks.
You could just as well use telnet, because the underlying network should be secure.
You don t have to mess with encryption keys, certs, etc., for every internal-only service. Since IPs are now trustworthy, that s all you need. hosts.allow could make a comeback!
You have a way of transiting out of extremely restrictive networks. Every tool discussed here has a way of falling back on routing things via a broker (relay) on TCP port 443 if all else fails.
There might sometimes be tradeoffs. For instance:
On LANs faster than 1Gbps, performance may degrade due to encryption and encapsulation overhead. However, these tools should let hosts discover the locality of each other and not send traffic over the Internet if the devices are local.
With some of these tools, hosts local to each other (on the same LAN) may be unable to find each other if they can t reach the control plane over the Internet (Internet is down or provider is down)
Some other features that some of the tools provide include:
Easy sharing of limited access with friends/guests
Taking care of everything you need, including SSL certs, for exposing a certain on-net service to the public Internet
Optional routing of your outbound Internet traffic via an exit node on your network. Useful, for instance, if your local network is blocking tons of stuff.
Let s dive in.
Types of Mesh VPNs
I ll go over several types of meshes in this article:
Fully decentralized with automatic hop routing
This model has no special central control plane. Nodes discover each other in various ways, and establish routes to each other. These routes can be direct connections over the Internet, or via other nodes. This approach offers the greatest resilience. Examples I ll cover include Yggdrasil and tinc.
Automatic peer-to-peer with centralized control
In this model, nodes, by default, communicate by establishing direct links between them. A regular node never carries traffic on behalf of other nodes. Special-purpose relays are used to handle cases in which NAT traversal is impossible. This approach tends to offer simple setup. Examples I ll cover include Tailscale, Zerotier, Nebula, and Netmaker.
Roll your own and hybrid approaches
This is a grab bag of other ideas; for instance, running Yggdrasil over Tailscale.
Terminology
For the sake of consistency, I m going to use common language to discuss things that have different terms in different ecosystems:
Every tool discussed here has a way of dealing with NAT traversal. It may assist with establishing direct connections (eg, STUN), and if that fails, it may simply relay traffic between nodes. I ll call such a relay a broker . This may or may not be the same system that is a control plane for a tool.
All of these systems operate over lower layers that are unencrypted. Those lower layers may be a LAN (wired or wireless, which may or may not have Internet access), or the public Internet (IPv4 and/or IPv6). I m going to call the unencrypted lower layer, whatever it is, the clearnet .
Evaluation Criteria
Here are the things I want to see from a solution:
Secure, with all communications end-to-end encrypted and authenticated, and prevention of traffic from untrusted devices.
Flexible, adapting to changes in network topology quickly and automatically.
Resilient, without single points of failure, and with devices local to each other able to communicate even if cut off from the Internet or other parts of the network.
Private, minimizing leakage of information or metadata about me and my systems
Able to traverse CGNAT without having to use a broker whenever possible
A lesser requirement for me, but still a nice to have, is the ability to include others via something like Internet publishing or inviting guests.
Fully or nearly fully Open Source
Free or very cheap for personal use
Wide operating system support, including headless Linux on x86_64 and ARM.
Fully Decentralized VPNs with Automatic Hop Routing
Two systems fit this description: Yggdrasil and Tinc. Let s dive in.
Yggdrasil
I ll start with Yggdrasil because I ve written so much about it already. It featured in prior posts such as:
Yggdrasil can be a private mesh VPN, or something more
Yggdrasil can be a private mesh VPN, just like the other tools covered here. It s unique, however, in that a key goal of the project is to also make it useful as a planet-scale global mesh network. As such, Yggdrasil is a testbed of new ideas in distributed routing designed to scale up to massive sizes and all sorts of connection conditions. As of 2023-04-10, the main global Yggdrasil mesh has over 5000 nodes in it. You can choose whether or not to participate.
Every node in a Yggdrasil mesh has a public/private keypair. Each node then has an IPv6 address (in a private address space) derived from its public key. Using these IPv6 addresses, you can communicate right away.
Yggdrasil differs from most of the other tools here in that it does not necessarily seek to establish a direct link on the clearnet between, say, host A and host G for them to communicate. It will prefer such a direct link if it exists, but it is perfectly happy if it doesn t.
The reason is that every Yggdrasil node is also a router in the Yggdrasil mesh. Let s sit with that concept for a moment. Consider:
If you have a bunch of machines on your LAN, but only one of them can peer over the clearnet, that s fine; all the other machines will discover this route to the world and use it when necessary.
All you need to run a broker is just a regular node with a public IP address. If you are participating in the global mesh, you can use one (or more) of the free public peers for this purpose.
It is not necessary for every node to know about the clearnet IP address of every other node (improving privacy). In fact, it s not even necessary for every node to know about the existence of all the other nodes, so long as it can find a route to a given node when it s asked to.
Yggdrasil can find one or more routes between nodes, and it can use this knowledge of multiple routes to aggressively optimize for varying network conditions, including combinations of, say, downloads and low-latency ssh sessions.
Behind the scenes, Yggdrasil calculates optimal routes between nodes as necessary, using a mesh-wide DHT for initial contact and then deriving more optimal paths. (You can also read more details about the routing algorithm.)
One final way that Yggdrasil is different from most of the other tools is that there is no separate control server. No node is special , in charge, the sole keeper of metadata, or anything like that. The entire system is completely distributed and auto-assembling.
Meeting neighbors
There are two ways that Yggdrasil knows about peers:
By broadcast discovery on the local LAN
By listening on a specific port (or being told to connect to a specific host/port)
Sometimes this might lead to multiple ways to connect to a node; Yggdrasil prefers the connection auto-discovered by broadcast first, then the lowest-latency of the defined path. In other words, when your laptops are in the same room as each other on your local LAN, your packets will flow directly between them without traversing the Internet.
Unique uses
Yggdrasil is uniquely suited to network-challenged situations. As an example, in a post-disaster situation, Internet access may be unavailable or flaky, yet there may be many local devices perhaps ones that had never known of each other before that could share information. Yggdrasil meets this situation perfectly. The combination of broadcast auto-detection, distributed routing, and so forth, basically means that if there is any physical path between two nodes, Yggdrasil will find and enable it.
Ad-hoc wifi is rarely used because it is a real pain. Yggdrasil actually makes it useful! Its broadcast discovery doesn t require any IP address provisioned on the interface at all (it just uses the IPv6 link-local address), so you don t need to figure out a DHCP server or some such. And, Yggdrasil will tend to perform routing along the contours of the RF path. So you could have a laptop in the middle of a long distance relaying communications from people farther out, because it could see both. Or even a chain of such things.
Yggdrasil: Security and Privacy
Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure you keep unauthorized traffic out: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously.
I ll discuss firewalling more at the end of this article. Basically, you ll almost certainly want to do this if you participate in the public mesh, because doing so is akin to having a globally-routable public IP address direct to your device.
If you want to restrict who can talk to your mesh, you just disable the broadcast feature on all your nodes (empty MulticastInterfaces section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes listening interfaces, which you ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys as appropriate). Yggdrasil doesn t allow filtering multicast clients by public key, only by network interface, so that s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal gateway node or two, that the clients just connect to when they re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only.
Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh but that list is the public keys of the directly-connected peers, not the clearnet IPs.
Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.
Yggdrasil: Connectivity and NAT traversal
Compared to the other tools, Yggdrasil is an interesting mix. It provides a fully functional mesh and facilitates connectivity in situations in which no other tool can. Yet its NAT traversal, while it exists and does work, results in using a broker under some of the more challenging CGNAT situations more often than some of the other tools, which can impede performance.
Yggdrasil s underlying protocol is TCP-based. Before you run away screaming that it must be slow and unreliable like OpenVPN over TCP it s not, and it is even surprisingly good around bufferbloat. I ve found its performance to be on par with the other tools here, and it works as well as I d expect even on flaky 4G links.
Overall, the NAT traversal story is mixed. On the one hand, you can run a node that listens on port 443 and Yggdrasil can even make it speak TLS (even though that s unnecessary from a security standpoint), so you can likely get out of most restrictive firewalls you will ever encounter. If you join the public mesh, know that plenty of public peers do listen on port 443 (and other well-known ports like 53, plus random high-numbered ones).
If you connect your system to multiple public peers, there is a chance though a very small one that some public transit traffic might be routed via it. In practice, public peers hopefully are already peered with each other, preventing this from happening (you can verify this with yggdrasilctl debug_remotegetpeers key=ABC...). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.
Yggdrasil: Sharing with friends
If you re open to participating in the public mesh, this is one of the easiest things of all. Have your friend install Yggdrasil, point them to a public peer, give them your Yggdrasil IP, and that s it. (Well, presumably you also open up your firewall you did follow my advice to set one up, right?)
If your friend is visiting at your location, they can just hop on your wifi, install Yggdrasil, and it will automatically discover a route to you. Yggdrasil even has a zero-config mode for ephemeral nodes such as certain Docker containers.
Yggdrasil doesn t directly support publishing to the clearnet, but it is certainly possible to proxy (or even NAT) to/from the clearnet, and people do.
Yggdrasil: DNS
There is no particular extra DNS in Yggdrasil. You can, of course, run a DNS server within Yggdrasil, just as you can anywhere else. Personally I just add relevant hosts to /etc/hosts and leave it at that, but it s up to you.
Yggdrasil: Source code, pricing, and portability
Yggdrasil is fully open source (LGPLv3 plus additional permissions in an exception) and highly portable. It is written in Go, and has prebuilt binaries for all major platforms (including a Debian package which I made).
There is no charge for anything with Yggdrasil. Listed public peers are free and run by volunteers. You can run your own peers if you like; they can be public and unlisted, public and listed (just submit a PR to get it listed), or private (accepting connections only from certain nodes keys). A peer in this case is just a node with a known clearnet IP address.
Yggdrasil encourages use in other projects. For instance, NNCP integrates a Yggdrasil node for easy communication with other NNCP nodes.
Yggdrasil conclusions
Yggdrasil is tops in reliability (having no single point of failure) and flexibility. It will maintain opportunistic connections between peers even if the Internet is down.
The unique added feature of being able to be part of a global mesh is a nice one.
The tradeoffs include being more prone to need to use a broker in restrictive CGNAT environments. Some other tools have clients that override the OS DNS resolver to also provide resolution of hostnames of member nodes; Yggdrasil doesn t, though you can certainly run your own DNS infrastructure over Yggdrasil (or, for that matter, let public DNS servers provide Yggdrasil answers if you wish).
There is also a need to pay more attention to firewalling or maintaining separation from the public mesh. However, as I explain below, many other options have potential impacts if the control plane, or your account for it, are compromised, meaning you ought to firewall those, too. Still, it may be a more immediate concern with Yggdrasil.
Although Yggdrasil is listed as experimental, I have been using it for over a year and have found it to be rock-solid. They did change how mesh IPs were calculated when moving from 0.3 to 0.4, causing a global renumbering, so just be aware that this is a possibility while it is experimental.
tinc
tinc is the oldest tool on this list; version 1.0 came out in 2003! You can think of tinc as something akin to an older Yggdrasil without the public option.
I will be discussing tinc 1.0.36, the latest stable version, which came out in 2019. The development branch, 1.1, has been going since 2011 and had its latest release in 2021. The last commit to the Github repo was in June 2022.
Tinc is the only tool here to support both tun and tap style interfaces. I go into the difference more in the Zerotier review below. Tinc actually provides a better tap implementation than Zerotier, with various sane options for broadcasts, but I still think the call for an Ethernet, as opposed to IP, VPN is small.
To configure tinc, you generate a per-host configuration and then distribute it to every tinc node. It contains a host s public key. Therefore, adding a host to the mesh means distributing its key everywhere; de-authorizing it means removing its key everywhere. This makes it rather unwieldy.
tinc can do LAN broadcast discovery and mesh routing, but generally speaking you must manually teach it where to connect initially. Somewhat confusingly, the examples all mention listing a public address for a node. This doesn t make sense for a laptop, and I suspect you d just omit it. I think that address is used for something akin to a Yggdrasil peer with a clearnet IP.
Unlike all of the other tools described here, tinc has no tool to inspect the running state of the mesh.
Some of the properties of tinc made it clear I was unlikely to adopt it, so this review wasn t as thorough as that of Yggdrasil.
tinc: Security and Privacy
As mentioned above, every host in the tinc mesh is authenticated based on its public key. However, to be more precise, this key is validated only at the point it connects to its next hop peer. (To be sure, this is also the same as how the list of allowed pubkeys works in Yggdrasil.) Since IPs in tinc are not derived from their key, and any host can assign itself whatever mesh IP it likes, this implies that a compromised host could impersonate another.
It is unclear whether packets are end-to-end encrypted when using a tinc node as a router. The fact that they can be routed at the kernel level by the tun interface implies that they may not be.
tinc: Connectivity and NAT traversal
I was unable to find much information about NAT traversal in tinc, other than that it does support it. tinc can run over UDP or TCP and auto-detects which to use, preferring UDP.
tinc: Sharing with friends
tinc has no special support for this, and the difficulty of configuration makes it unlikely you d do this with tinc.
tinc: Source code, pricing, and portability
tinc is fully open source (GPLv2). It is written in C and generally portable. It supports some very old operating systems. Mobile support is iffy.
tinc does not seem to be very actively maintained.
tinc conclusions
I haven t mentioned performance in my other reviews (see the section at the end of this post). But, it is so poor as to only run about 300Mbps on my 2.5Gbps network. That s 1/3 the speed of Yggdrasil or Tailscale. Combine that with the unwieldiness of adding hosts and some uncertainties in security, and I m not going to be using tinc.
Automatic Peer-to-Peer Mesh VPNs with centralized control
These tend to be the options that are frequently discussed. Let s talk about the options.
Tailscale
Tailscale is a popular choice in this type of VPN. To use Tailscale, you first sign up on tailscale.com. Then, you install the tailscale client on each machine. On first run, it prints a URL for you to click on to authorize the client to your mesh ( tailnet ). Tailscale assigns a mesh IP to each system. The Tailscale client lets the Tailscale control plane gather IP information about each node, including all detectable public and private clearnet IPs.
When you attempt to contact a node via Tailscale, the client will fetch the known contact information from the control plane and attempt to establish a link. If it can contact over the local LAN, it will (it doesn t have broadcast autodetection like Yggdrasil; the information must come from the control plane). Otherwise, it will try various NAT traversal options. If all else fails, it will use a broker to relay traffic; Tailscale calls a broker a DERP relay server. Unlike Yggdrasil, a Tailscale node never relays traffic for another; all connections are either direct P2P or via a broker.
Tailscale, like several others, is based around Wireguard; though wireguard-go rather than the in-kernel Wireguard.
Tailscale has a number of somewhat unique features in this space:
Funnel, which lets you expose ports on your system to the public Internet via the VPN.
Exit nodes, which automate the process of routing your public Internet traffic over some other node in the network. This is possible with every tool mentioned here, but Tailscale makes switching it on or off a couple of quick commands away.
Node sharing, which lets you share a subset of your network with guests
Funnel, in particular, is interesting. With a couple of tailscale serve -style commands, you can expose a directory tree (or a development webserver) to the world. Tailscale gives you a public hostname, obtains a cert for it, and proxies inbound traffic to you. This is subject to some unspecified bandwidth limits, and you can only choose from three public ports, so it s not really a production solution but as a quick and easy way to demonstrate something cool to a friend, it s a neat feature.
Tailscale: Security and Privacy
With Tailscale, as with the other tools in this category, one of the main threats to consider is the control plane. What are the consequences of a compromise of Tailscale s control plane, or of the credentials you use to access it?
Let s begin with the credentials used to access it. Tailscale operates no identity system itself, instead relying on third parties. For individuals, this means Google, Github, or Microsoft accounts; Okta and other SAML and similar identity providers are also supported, but this runs into complexity and expense that most individuals aren t wanting to take on. Unfortunately, all three of those types of accounts often have saved auth tokens in a browser. Personally I would rather have a separate, very secure, login.
If a person does compromise your account or the Tailscale servers themselves, they can t directly eavesdrop on your traffic because it is end-to-end encrypted. However, assuming an attacker obtains access to your account, they could:
Tamper with your Tailscale ACLs, permitting new actions
Add new nodes to the network
Forcibly remove nodes from the network
Enable or disable optional features
Of note is that they cannot just commandeer an existing IP. I would say the riskiest possibility here is that could add new nodes to the mesh. Because they could also tamper with your ACLs, they could then proceed to attempt to access all your internal services. They could even turn on service collection and have Tailscale tell them what and where all the services are.
Therefore, as with other tools, I recommend a local firewall on each machine with Tailscale. More on that below.
Tailscale has a new alpha feature called tailnet lock which helps with this problem. It requires existing nodes in the mesh to sign a request for a new node to join. Although this doesn t address ACL tampering and some of the other things, it does represent a significant help with the most significant concern. However, tailnet lock is in alpha, only available on the Enterprise plan, and has a waitlist, so I have been unable to test it.
Any Tailscale node can request the IP addresses belonging to any other Tailscale node. The Tailscale control plane captures, and exposes to you, this information about every node in your network: the OS hostname, IP addresses and port numbers, operating system, creation date, last seen timestamp, and NAT traversal parameters. You can optionally enable service data capture as well, which sends data about open ports on each node to the control plane.
Tailscale likes to highlight their key expiry and rotation feature. By default, all keys expire after 180 days, and traffic to and from the expired node will be interrupted until they are renewed (basically, you re-login with your provider and do a renew operation). Unfortunately, the only mention I can see of warning of impeding expiration is in the Windows client, and even there you need to edit a registry key to get the warning more than the default 24 hours in advance. In short, it seems likely to cut off communications when it s most important. You can disable key expiry on a per-node basis in the admin console web interface, and I mostly do, due to not wanting to lose connectivity at an inopportune time.
Tailscale: Connectivity and NAT traversal
When thinking about reliability, the primary consideration here is being able to reach the Tailscale control plane. While it is possible in limited circumstances to reach nodes without the Tailscale control plane, it is a fairly brittle setup and notably will not survive a client restart. So if you use Tailscale to reach other nodes on your LAN, that won t work unless your Internet is up and the control plane is reachable.
Assuming your Internet is up and Tailscale s infrastructure is up, there is little to be concerned with. Your own comfort level with cloud providers and your Internet should guide you here.
Tailscale wrote a fantastic article about NAT traversal and they, predictably, do very well with it. Tailscale prefers UDP but falls back to TCP if needed. Broker (DERP) servers step in as a last resort, and Tailscale clients automatically select the best ones. I m not aware of anything that is more successful with NAT traversal than Tailscale. This maximizes the situations in which a direct P2P connection can be used without a broker.
I have found Tailscale to be a bit slow to notice changes in network topography compared to Yggdrasil, and sometimes needs a kick in the form of restarting the client process to re-establish communications after a network change. However, it s possible (maybe even probable) that if I d waited a bit longer, it would have sorted this all out.
Tailscale: Sharing with friends
I touched on the funnel feature earlier. The sharing feature lets you give an invite to an outsider. By default, a person accepting a share can make only outgoing connections to the network they re invited to, and cannot receive incoming connections from that network this makes sense. When sharing an exit node, you get a checkbox that lets you share access to the exit node as well. Of course, the person accepting the share needs to install the Tailnet client. The combination of funnel and sharing make Tailscale the best for ad-hoc sharing.
Tailscale: DNS
Tailscale s DNS is called MagicDNS. It runs as a layer atop your standard DNS taking over /etc/resolv.conf on Linux and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick.
It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf. I can t really say this is entirely Tailscale s fault; they document the problem and some workarounds.
I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.
Tailscale: Source code, pricing, and portability
Tailscale is almost fully open source and the client is highly portable. The client is open source (BSD 3-clause) on open source platforms, and closed source on closed source platforms. The DERP servers are open source. The coordination server is closed source, although there is an open source coordination server called Headscale (also BSD 3-clause) made available with Tailscale s blessing and informal support. It supports most, but not all, features in the Tailscale coordination server.
Tailscale s pricing (which does not apply when using Headscale) provides a free plan for 1 user with up to 20 devices. A Personal Pro plan expands that to 100 devices for $48 per year - not a bad deal at $4/mo. A Community on Github plan also exists, and then there are more business-oriented plans as well. See the pricing page for details.
As a small note, I appreciated Tailscale s install script. It properly added Tailscale s apt key in a way that it can only be used to authenticate the Tailscale repo, rather than as a systemwide authenticator. This is a nice touch and speaks well of their developers.
Tailscale conclusions
Tailscale is tops in sharing and has a broad feature set and excellent documentation. Like other solutions with a centralized control plane, device communications can stop working if the control plane is unreachable, and the threat model of the control plane should be carefully considered.
Zerotier
Zerotier is a close competitor to Tailscale, and is similar to it in a lot of ways. So rather than duplicate all of the Tailscale information here, I m mainly going to describe how it differs from Tailscale.
The primary difference between the two is that Zerotier emulates an Ethernet network via a Linux tap interface, while Tailscale emulates a TCP/IP network via a Linux tun interface.
However, Zerotier has a number of things that make it be a somewhat imperfect Ethernet emulator. For one, it has a problem with broadcast amplification; the machine sending the broadcast sends it to all the other nodes that should receive it (up to a set maximum). I wouldn t want to have a lot of programs broadcasting on a slow link. While in theory this could let you run Netware or DECNet across Zerotier, I m not really convinced there s much call for that these days, and Zerotier is clearly IP-focused as it allocates IP addresses and such anyhow. Zerotier provides special support for emulated ARP (IPv4) and NDP (IPv6). While you could theoretically run Zerotier as a bridge, this eliminates the zero trust principle, and Tailscale supports subnet routers, which provide much of the same feature set anyhow.
A somewhat obscure feature, but possibly useful, is Zerotier s built-in support for multipath WAN for the public interface. This actually lets you do a somewhat basic kind of channel bonding for WAN.
Zerotier: Security and Privacy
The picture here is similar to Tailscale, with the difference that you can create a Zerotier-local account rather than relying on cloud authentication. I was unable to find as much detail about Zerotier as I could about Tailscale - notably I couldn t find anything about how sticky an IP address is. However, the configuration screen lets me delete a node and assign additional arbitrary IPs within a subnet to other nodes, so I think the assumption here is that if your Zerotier account (or the Zerotier control plane) is compromised, an attacker could remove a legit device, add a malicious one, and assign the previous IP of the legit device to the malicious one. I m not sure how to mitigate against that risk, as firewalling specific IPs is ineffective if an attacker can simply take them over. Zerotier also lacks anything akin to Tailnet Lock.
For this reason, I didn t proceed much further in my Zerotier evaluation.
Zerotier: Connectivity and NAT traversal
Like Tailscale, Zerotier has NAT traversal with STUN. However, it looks like it s more limited than Tailscale s, and in particular is incompatible with double NAT that is often seen these days. Zerotier operates brokers ( root servers ) that can do relaying, including TCP relaying. So you should be able to connect even from hostile networks, but you are less likely to form a P2P connection than with Tailscale.
Zerotier: Sharing with friends
I was unable to find any special features relating to this in the Zerotier documentation. Therefore, it would be at the same level as Yggdrasil: possible, maybe even not too difficult, but without any specific help.
Zerotier: DNS
Unlike Tailscale, Zerotier does not support automatically adding DNS entries for your hosts. Therefore, your options are approximately the same as Yggdrasil, though with the added option of pushing configuration pointing to your own non-Zerotier DNS servers to the client.
Zerotier: Source code, pricing, and portability
The client ZeroTier One is available on Github under a custom business source license which prevents you from using it in certain settings. This license would preclude it being included in Debian. Their library, libzt, is available under the same license. The pricing page mentions a community edition for self hosting, but the documentation is sparse and it was difficult to understand what its feature set really is.
The free plan lets you have 1 user with up to 25 devices. Paid plans are also available.
Zerotier conclusions
Frankly I don t see much reason to use Zerotier. The virtual Ethernet model seems to be a weird hybrid that doesn t bring much value. I m concerned about the implications of a compromise of a user account or the control plane, and it lacks a lot of Tailscale features (MagicDNS and sharing). The only thing it may offer in particular is multipath WAN, but that s esoteric enough and also solvable at other layers that it doesn t seem all that compelling to me. Add to that the strange license and, to me anyhow, I don t see much reason to bother with it.
Netmaker
Netmaker is one of the projects that is making noise these days. Netmaker is the only one here that is a wrapper around in-kernel Wireguard, which can make a performance difference when talking to peers on a 1Gbps or faster link. Also, unlike other tools, it has an ingress gateway feature that lets people that don t have the Netmaker client, but do have Wireguard, participate in the VPN. I believe I also saw a reference somewhere to nodes as routers as with Yggdrasil, but I m failing to dig it up now.
The project is in a bit of an early state; you can sign up for an upcoming closed beta with a SaaS host, but really you are generally pointed to self-hosting using the code in the github repo. There are community and enterprise editions, but it s not clear how to actually choose. The server has a bunch of components: binary, CoreDNS, database, and web server. It also requires elevated privileges on the host, in addition to a container engine. Contrast that to the single binary that some others provide.
It looks like releases are frequent, but sometimes break things, and have a somewhat more laborious upgrade processes than most.
I don t want to spend a lot of time managing my mesh. So because of the heavy needs of the server, the upgrades being labor-intensive, it taking over iptables and such on the server, I didn t proceed with a more in-depth evaluation of Netmaker. It has a lot of promise, but for me, it doesn t seem to be in a state that will meet my needs yet.
Nebula
Nebula is an interesting mesh project that originated within Slack, seems to still be primarily sponsored by Slack, but is also being developed by Defined Networking (though their product looks early right now). Unlike the other tools in this section, Nebula doesn t have a web interface at all. Defined Networking looks likely to provide something of a SaaS service, but for now, you will need to run a broker ( lighthouse ) yourself; perhaps on a $5/mo VPS.
Due to the poor firewall traversal properties, I didn t do a full evaluation of Nebula, but it still has a very interesting design.
Nebula: Security and Privacy
Since Nebula lacks a traditional control plane, the root of trust in Nebula is a CA (certificate authority). The documentation gives this example of setting it up:
So the cert contains your IP, hostname, and group allocation. Each host in the mesh gets your CA certificate, and the per-host cert and key generated from each of these steps.
This leads to a really nice security model. Your CA is the gatekeeper to what is trusted in your mesh. You can even have it airgapped or something to make it exceptionally difficult to breach the perimeter.
Nebula contains an integrated firewall. Because the ability to keep out unwanted nodes is so strong, I would say this may be the one mesh VPN you might consider using without bothering with an additional on-host firewall.
You can define static mappings from a Nebula mesh IP to a clearnet IP. I haven t found information on this, but theoretically if NAT traversal isn t required, these static mappings may allow Nebula nodes to reach each other even if Internet is down. I don t know if this is truly the case, however.
Nebula: Connectivity and NAT traversal
This is a weak point of Nebula. Nebula sends all traffic over a single UDP port; there is no provision for using TCP. This is an issue at certain hotel and other public networks which open only TCP egress ports 80 and 443.
I couldn t find a lot of detail on what Nebula s NAT traversal is capable of, but according to a certain Github issue, this has been a sore spot for years and isn t as capable as Tailscale.
You can designate nodes in Nebula as brokers (relays). The concept is the same as Yggdrasil, but it s less versatile. You have to manually designate what relay to use. It s unclear to me what happens if different nodes designate different relays. Keep in mind that this always happens over a UDP port.
Nebula: Sharing with friends
There is no particular support here.
Nebula: DNS
Nebula has experimental DNS support. In contrast with Tailscale, which has an internal DNS server on every node, Nebula only runs a DNS server on a lighthouse. This means that it can t forward requests to a DNS server that s upstream for your laptop s particular current location. Actually, Nebula s DNS server doesn t forward at all. It also doesn t resolve its own name.
The Nebula documentation makes reference to using multiple lighthouses, which you may want to do for DNS redundancy or performance, but it s unclear to me if this would make each lighthouse form a complete picture of the network.
Nebula: Source code, pricing, and portability
Nebula is fully open source (MIT). It consists of a single Go binary and configuration. It is fairly portable.
Nebula conclusions
I am attracted to Nebula s unique security model. I would probably be more seriously considering it if not for the lack of support for TCP and poor general NAT traversal properties. Its datacenter connectivity heritage does show through.
Roll your own and hybrid
Here is a grab bag of ideas:
Running Yggdrasil over Tailscale
One possibility would be to use Tailscale for its superior NAT traversal, then allow Yggdrasil to run over it. (You will need a firewall to prevent Tailscale from trying to run over Yggdrasil at the same time!) This creates a closed network with all the benefits of Yggdrasil, yet getting the NAT traversal from Tailscale.
Drawbacks might be the overhead of the double encryption and double encapsulation. A good Yggdrasil peer may wind up being faster than this anyhow.
Public VPN provider for NAT traversal
A public VPN provider such as Mullvad will often offer incoming port forwarding and nodes in many cities. This could be an attractive way to solve a bunch of NAT traversal problems: just use one of those services to get you an incoming port, and run whatever you like over that.
Be aware that a number of public VPN clients have a kill switch to prevent any traffic from egressing without using the VPN; see, for instance, Mullvad s. You ll need to disable this if you are running a mesh atop it.
Other
Combining with local firewalls
For most of these tools, I recommend using a local firewal in conjunction with them. I have been using firehol and find it to be quite nice. This means you don t have to trust the mesh, the control plane, or whatever. The catch is that you do need your mesh VPN to provide strong association between IP address and node. Most, but not all, do.
Performance
I tested some of these for performance using iperf3 on a 2.5Gbps LAN. Here are the results. All speeds are in Mbps.
Tool
iperf3 (default)
iperf3 -P 10
iperf3 -R
Direct (no VPN)
2406
2406
2764
Wireguard (kernel)
1515
1566
2027
Yggdrasil
892
1126
1105
Tailscale
950
1034
1085
Tinc
296
300
277
You can see that Wireguard was significantly faster than the other options. Tailscale and Yggdrasil were roughly comparable, and Tinc was terrible.
IP collisions
When you are communicating over a network such as these, you need to trust that the IP address you are communicating with belongs to the system you think it does. This protects against two malicious actor scenarios:
Someone compromises one machine on your mesh and reconfigures it to impersonate a more important one
Someone connects an unauthorized system to the mesh, taking over a trusted IP, and uses the privileges of the trusted IP to access resources
To summarize the state of play as highlighted in the reviews above:
Yggdrasil derives IPv6 addresses from a public key
tinc allows any node to set any IP
Tailscale IPs aren t user-assignable, but the assignment algorithm is unknown
Zerotier allows any IP to be allocated to any node at the control plane
I don t know what Netmaker does
Nebula IPs are baked into the cert and signed by the CA, but I haven t verified the enforcement algorithm
So this discussion really only applies to Yggdrasil and Tailscale. tinc and Zerotier lack detailed IP security, while Nebula expects IP allocations to be handled outside of the tool and baked into the certs (therefore enforcing rigidity at that level).
So the question for Yggdrasil and Tailscale is: how easy is it to commandeer a trusted IP?
Yggdrasil has a brief discussion of this. In short, Yggdrasil offers you both a dedicated IP and a rarely-used /64 prefix which you can delegate to other machines on your LAN. Obviously by taking the dedicated IP, a lot more bits are available for the hash of the node s public key, making collisions technically impractical, if not outright impossible. However, if you use the /64 prefix, a collision may be more possible. Yggdrasil s hashing algorithm includes some optimizations to make this more difficult. Yggdrasil includes a genkeys tool that uses more CPU cycles to generate keys that are maximally difficult to collide with.
Tailscale doesn t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well.
So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.
Final thoughts
For my own purposes, I suspect I will remain with Yggdrasil in some fashion. Maybe I will just take the small performance hit that using a relay node implies. Or perhaps I will get clever and use an incoming VPN port forward or go over Tailscale.
Tailscale was the other option that seemed most interesting. However, living in a region with Internet that goes down more often than I d like, I would like to just be able to send as much traffic over a mesh as possible, trusting that if the LAN is up, the mesh is up.
I have one thing that really benefits from performance in excess of Yggdrasil or Tailscale: NFS. That s between two machines that never leave my LAN, so I will probably just set up a direct Wireguard link between them. Heck of a lot easier than trying to do Kerberos!
Finally, I wrote this intending to be useful. I dealt with a lot of complexity and under-documentation, so it s possible I got something wrong somewhere. Please let me know if you find any errors.
This blog post is a copy of a page on my website. That page may be periodically updated.
The Last Hero is the 27th Discworld novel and part of the Rincewind
subseries. This is something of a sequel to Interesting Times and relies heavily on the cast that was built
up in previous books. It's not a good place to start with the series.
At last, the rare Rincewind novel that I enjoyed. It helps that Rincewind
is mostly along for the ride.
Cohen the Barbarian and his band of elderly heroes have decided they're
tired of enjoying their spoils and are going on a final adventure.
They're going to return fire to the gods, in the form of a giant bomb.
The wizards in Ankh-Morpork get wind of this and realize that an explosion
at the Hub where the gods live could disrupt the magical field of the
entire Disc, effectively destroying it. The only hope seems to be to
reach Cori Celesti before Cohen and head him off, but Cohen is already
almost there. Enter Lord Vetinari, who has Leonard of Quirm design a
machine that will get them there in time by slingshotting under the Disc
itself.
First off, let me say how much I love the idea of returning fire to the
gods with interest. I kind of wish Pratchett had done more with their
motivations, but I was laughing about that through the whole book.
Second, this is the first of the illustrated Discworld books that I've
read in the intended illustrated form (I read the paperback version of
Eric), and this book is gorgeous. I
enjoyed Paul Kidby's art far more than I had expected to. His style what
I will call, for lack of better terminology due to my woeful art
education, "highly detailed caricature." That's not normally a style that
clicks with me, but it works incredibly well for Discworld.
The Last Hero is richly illustrated, with some amount of art, if
only subtle background behind the text, on nearly every page. There are
several two-page spreads, but oddly I thought those (including the parody
of The Scream on the cover) were the worst art of the book. None
of them did much for me. The best art is in the figure studies and subtle
details: Leonard of Quirm's beautiful calligraphy, his numerous sketches,
the labeled illustration of the controls of the flying machine, and the
portraits of Cohen's band and the people they encounter. The edition I
got is printed on lovely, thick glossy paper, and the subtle art texture
behind the writing makes this book a delight to read. I'm not sure if,
like Eric, this book comes in other editions, but if so, I highly
recommend getting or finding the high-quality illustrated edition for the
best reading experience.
The plot, like a lot of the Rincewind books, doesn't amount to much, but I
enjoyed the mission to intercept Cohen. Leonard of Quirm is a great
character, and the slow revelation of his flying machine design (which I
will not spoil) is a delightful combination of Leonardo da Vinci parody,
Discworld craziness, and NASA homage. Also, the Librarian is involved,
which always improves a Discworld book. (The Luggage, sadly, is not; I
would have liked to have seen a richly-illustrated story about it, but it
looks like I'll have to find the illustrated version of Eric for
that.)
There is one of Pratchett's philosophical subtexts here, about heroes and
stories and what it means for your story to live on. To be honest, it
didn't grab me; it's mostly subtext, and this particular set of characters
weren't quite introspective enough to make the philosophy central to the
story. Also, I was perhaps too sympathetic to Cohen's goals, and thus not
very interested in anyone successfully stopping him. But I had a lot more
fun with this one than I usually do with Rincewind books, helped
considerably by the illustrations. If you've been skipping Rincewind
books in your Discworld read-through and have access to the illustrated
edition of The Last Hero, consider making an exception for this
one.
Followed by The Amazing Maurice and His Educated Rodents in
publication order and, thematically, by Unseen Academicals.
Rating: 7 out of 10
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
tl;dr: go to reproduce-trisquel.
When I set about to understand how Trisquel worked, I identified a number of things that would improve my confidence in it. The lowest hanging fruit for me was to manually audit the package archive, and I wrote a tool called debdistdiff to automate this for me. That led me to think about apt archive transparency more in general. I have made some further work in that area (hint: apt-verify) that deserve its own blog post eventually. Most of apt archive transparency is futile if we don t trust the intended packages that are in the archive. One way to measurable increase trust in the package are to provide reproducible builds of the packages, which should by now be an established best practice. Code review is still important, but since it will never provide positive guarantees we need other processes that can identify sub-optimal situations automatically. The way reproducible builds easily identify negative results is what I believe has driven much of its success: its results are tangible and measurable. The field of software engineering is in need of more such practices.
The design of my setup to build Trisquel reproducible are as follows.
The project debdistget is responsible for downloading Release/Packages files (which are the most relevant files from dists/) from apt archives, and works by commiting them into GitLab-hosted git-repositories. I maintain several such repositories for popular apt-archives, including for Trisquel and its upstream Ubuntu. GitLab invokes a schedule pipeline to do the downloading, and there is some race conditions here.
The project debdistdiff is used to produce the list of added and modified packages, which are the input to actually being able to know what packages to reproduce. It publishes human readable summary of difference for several distributions, including Trisquel vs Ubuntu. Early on I decided that rebuilding all of the upstream Ubuntu packages is out of scope for me: my personal trust in the official Debian/Ubuntu apt archives are greater than my trust of the added/modified packages in Trisquel.
The final project reproduce-trisquel puts the pieces together briefly as follows, everything being driven from its .gitlab-ci.yml file.
There is a (manually triggered) job generate-build-image to create a build image to speed up CI/CD runs, using a simple Dockerfile.
There is a (manually triggered) job generate-package-lists that uses debdistdiff to generate and store package lists and puts its output in lists/. The reason this is manually triggered right now is due to a race condition.
There is a (scheduled) job that does two things: from the package lists, the script generate-ci-packages.sh builds a GitLab CI/CD instruction file ci-packages.yml that describes jobs for each package to build. The second part is generate-readme.sh that re-generate the project s README.md based on the build logs and diffoscope outputs that stored in the git repository.
Through the ci-packages.yml file, there is a large number of jobs that are dynamically defined, which currently are manually triggered to not overload the build servers. The script build-package.sh is invoked and attempts to rebuild a package, and stores build log and diffoscope output in the git project itself.
I did not expect to be able to use the GitLab shared runners to do the building, however they turned out to work quite well and I postponed setting up my own runner. There is a manually curated lists/disabled-aramo.txt with some packages that all required too much disk space or took over two hours to build. Today I finally took the time to setup a GitLab runner using podman running Trisquel aramo, and I expect to complete builds of the remaining packages soon one of my Dell R630 server with 256GB RAM and dual 2680v4 CPUs should deliver sufficient performance.
Current limitations and ideas on further work (most are filed as project issues) include:
We don t support *.buildinfo files. As far as I am aware, Trisquel does not publish them for their builds. Improving this would be a first step forward, anyone able to help? Compare buildinfo.debian.net. For example, many packages differ only in their NT_GNU_BUILD_ID symbol inside the ELF binary, see example diffoscope output for libgpg-error. By poking around in jenkins.trisquel.org I managed to discover that Trisquel built initramfs-utils in the randomized path /build/initramfs-tools-bzRLUp and hard-coding that path allowed me to reproduce that package. I expect the same to hold for many other packages. Unfortunately, this failure turned into success with that package moved the needle from 42% reproducibility to 43% however I didn t let that stand in the way of a good headline.
The mechanism to download the Release/Package-files from dists/ is not fool-proof: we may not capture all ever published such files. While this is less of a concern for reproducibility, it is more of a concern for apt transparency. Still, having Trisquel provide a service similar to snapshot.debian.org would help.
Having at least one other CPU architecture would be nice.
Due to lack of time and mental focus, handling incremental updates of new versions of packages is not yet working. This means we only ever build one version of a package, and never discover any newly published versions of the same package. Now that Trisquel aramo is released, the expected rate of new versions should be low, but still happens due to security or backports.
Porting this to test supposedly FSDG-compliant distributions such as PureOS and Gnuinos should be relatively easy. I m also looking at Devuan because of Gnuinos.
The elephant in the room is how reproducible Ubuntu is in the first place.
Happy Easter Hacking!
Update 2023-04-17: The original project reproduce-trisquel that was announced here has been archived and replaced with two projects, one generic debdistreproduce and one with results for Trisquel: reproduce/trisquel .
I recently bought a Banana Pi BPI-M5, which uses the Amlogic S905X3 SoC: these are my notes about installing Debian on it.
While this SoC is supported by the upstream U-Boot it is not supported by the Debian U-Boot package, so debian-installer does not work. Do not be fooled by seeing the DTB file for this exact board being distributed with debian-installer: all DTB files are, and it does not mean that the board is supposed to work.
As I documented in #1033504, the Debian kernels are currently missing some patches needed to support the SD card reader.
I started by downloading an Armbian Banana Pi image and booted it from an SD card. From there I partitioned the eMMC, which always appears as /dev/mmcblk1:
Make sure to leave enough space before the first partition, or else U-Boot will overwrite it: as it is common for many ARM SoCs, U-Boot lives somewhere in the gap between the MBR and the first partition.
I looked at Armbian's /usr/lib/u-boot/platform_install.sh and installed U-Boot by manually copying it to the eMMC:
I wanted to have a fully working flash-kernel, so I used Armbian's boot.scr as a template to create /etc/flash-kernel/bootscript/bootscr.meson and then added a custom entry for the Banana Pi to /etc/flash-kernel/db:
All things considered I do not think that I would recommend to Debian users to buy Amlogic-based boards since there are many other better supported SoCs.
Review: The Nordic Theory of Everything, by Anu Partanen
Publisher:
Harper
Copyright:
2016
Printing:
June 2017
ISBN:
0-06-231656-7
Format:
Kindle
Pages:
338
Anu Partanen is a Finnish journalist who immigrated to the United States.
The Nordic Theory of Everything, subtitled In Search of a
Better Life, is an attempt to explain the merits of Finnish approaches to
government and society to a US audience. It was her first book.
If you follow US policy discussion at all, you have probably been exposed
to many of the ideas in this book. There was a time when the US left was
obsessed with comparisons between the US and Nordic countries, and while
that obsession has faded somewhat, Nordic social systems are still
discussed with envy and treated as a potential model. Many of the topics
of this book are therefore predictable: parental leave, vacation, health
care, education, happiness, life expectancy, all the things that are far
superior in Nordic countries than in the United States by essentially
every statistical measure available, and which have been much-discussed.
Partanen brings two twists to this standard analysis. The first is that
this book is part memoir: she fell in love with a US writer and made the
decision to move to the US rather than asking him to move to Finland. She
therefore experienced the transition between social and government systems
first-hand and writes memorably on the resulting surprise, trade-offs,
anxiety, and bafflement. The second, which I've not seen previously in
this policy debate, is a fascinating argument that Finland is a far more
individualistic country than the United States precisely because of its
policy differences.
Most people, including myself, assumed that part of what made the
United States a great country, and such an exceptional one, was that
you could live your life relatively unencumbered by the downside of a
traditional, old-fashioned society: dependency on the people you
happened to be stuck with. In America you had the liberty to express
your individuality and choose your own community. This would allow
you to interact with family, neighbors, and fellow citizens on the
basis of who you were, rather than on what you were obligated to do or
expected to be according to old-fashioned thinking.
The longer I lived in America, therefore, and the more places I
visited and the more people I met and the more American I myself
became the more puzzled I grew. For it was exactly those key
benefits of modernity freedom, personal independence, and
opportunity that seemed, from my outsider s perspective, in a
thousand small ways to be surprisingly missing from American life
today. Amid the anxiety and stress of people s daily lives, those
grand ideals were looking more theoretical than actual.
The core of this argument is that the structure of life in the United
States essentially coerces dependency on other people: employers, spouses,
parents, children, and extended family. Because there is no universally
available social support system, those relationships become essential for
any hope of a good life, and often for survival. If parents do not
heavily manage their children's education, there is a substantial risk of
long-lasting damage to the stability and happiness of their life. If
children do not care for their elderly parents, they may receive no care
at all. Choosing not to get married often means choosing precarity and
exhaustion because navigating society without pooling resources with
someone else is incredibly difficult.
It was as if America, land of the Hollywood romance, was in practice
mired in a premodern time when marriage was, first and foremost, not
an expression of love, but rather a logistical and financial pact to
help families survive by joining resources.
Partanen contrasts this with what she calls the Nordic theory of love:
What Lars Tr g rdh came to understand during his years in the United
States was that the overarching ambition of Nordic societies during
the course of the twentieth century, and into the twenty-first, has
not been to socialize the economy at all, as is often mistakenly
assumed. Rather the goal has been to free the individual from all
forms of dependency within the family and in civil society: the poor
from charity, wives from husbands, adult children from parents, and
elderly parents from their children. The express purpose of this
freedom is to allow all those human relationships to be unencumbered
by ulterior motives and needs, and thus to be entirely free,
completely authentic, and driven purely by love.
She sees this as the common theme through most of the policy differences
discussed in this book. The Finnish approach is to provide neutral and
universal logistical support for most of life's expected challenges:
birth, child-rearing, education, health, unemployment, and aging. This
relieves other social relations family, employer, church of the
corrosive strain of dependency and obligation. It also ensures people's
basic well-being isn't reliant on accidents of association.
If the United States is so worried about crushing entrepreneurship and
innovation, a good place to start would be freeing start-ups and
companies from the burdens of babysitting the nation s citizens.
I found this fascinating as a persuasive technique. Partanen embraces the
US ideal of individualism and points out that, rather than being
collectivist as the US right tends to assume, Finland is better at
fostering individualism and independence because the government works to
removes unnecessary premodern constraints on individual lives. The reason
why so many Americans are anxious and frantic is not a personal failing or
bad luck. It's because the US social system is deeply hostile to healthy
relationships and individual independence. It demands a constant level of
daily problem-solving and crisis management that is profoundly exhausting,
nearly impossible to navigate alone, and damaging to the ideal of equal
relationships.
Whether this line of argument will work is another question, and I'm
dubious for reasons that Partanen (probably wisely) avoids. She presents
the Finnish approach as a discovery that the US would benefit from, and
the US approach as a well-intentioned mistake. I think this is
superficially appealing; almost all corners of US political belief at
least give lip service to individualism and independence. However,
advocates of political change will eventually need to address the fact
that many US conservatives see this type of social coercion as an intended
feature of society rather than a flaw.
This is most obvious when one looks at family relationships. Partanen
treats the idea that marriage should be a free choice between equals
rather than an economic necessity as self-evident, but there is a
significant strain of US political thought that embraces punishing people
for not staying within the bounds of a conservative ideal of family. One
will often find, primarily but not exclusively among the more religious, a
contention that the basic unit of society is the (heterosexual,
patriarchal) family, not the individual, and that the suffering of anyone
outside that structure is their own fault. Not wanting to get married, be
the primary caregiver for one's parents, or abandon a career in order to
raise children is treated as malignant selfishness and immorality rather
than a personal choice that can be enabled by a modern social system.
Here, I think Partanen is accurate to identify the Finnish social system
as more modern. It embraces the philosophical concept of modernity,
namely that social systems can be improved and social structures are not
timeless. This is going to be a hard argument to swallow for those who
see the pressure towards forming dependency ties within families as
natural, and societal efforts to relieve those pressures as government
meddling. In that intellectual framework, rather than an attempt to
improve the quality of life, government logistical support is perceived as
hostility to traditional family obligations and an attempt to replace
"natural" human ties with "artificial" dependence on government services.
Partanen doesn't attempt to have that debate.
Two other things struck me in this book. The first is that, in Partanen's
presentation, Finns expect high-quality services from their government and
work to improve it when it falls short. This sounds like an obvious
statement, but I don't think it is in the context of US politics, and
neither does Partanen. She devotes a chapter to the topic, subtitled "Go
ahead: ask what your country can do for you."
This is, to me, one of the most frustrating aspects of US political
debate. Our attitude towards government is almost entirely hostile and
negative even among the political corners that would like to see
government do more. Failures of government programs are treated as
malice, malfeasance, or inherent incompetence: in short, signs the program
should never have been attempted, rather than opportunities to learn and
improve. Finland had mediocre public schools, decided to make them
better, and succeeded. The moment US public schools start deteriorating,
we throw much of our effort into encouraging private competition and
dismantling the public school system.
Partanen doesn't draw this connection, but I see a link between the US
desire for market solutions to societal problems and the level of
exhaustion and anxiety that is so common in US life. Solving problems by
throwing them open to competition is a way of giving up, of saying we have
no idea how to improve something and are hoping someone else will figure
it out for a profit. Analyzing the failures of an existing system and
designing incremental improvements is hard and slow work. Throwing out
the system and hoping some corporation will come up with something better
is disruptive but easy.
When everyone is already overwhelmed by life and devoid of energy to
work on complex social problems, it's tempting to give up on compromise
and coalition-building and let everyone go their separate ways on their
own dime. We cede the essential work of designing a good society to
start-ups. This creates a vicious cycle: the resulting market solutions
are inevitably gated by wealth and thus precarious and artificially
scarce, which in turn creates more anxiety and stress. The short-term
energy savings from not having to wrestle with a hard problem is
overwhelmed by the long-term cost of having to navigate a complex and
adversarial economic relationship.
That leads into the last point: schools. There's a lot of discussion here
about school quality and design, which I won't review in detail but which
is worth reading. What struck me about Partanen's discussion, though, is
how easy the Finnish system is to use. Finnish parents just send their
kids to the most convenient school and rarely give that a second thought.
The critical property is that all the schools are basically fine, and
therefore there is no need to place one's child in an exceptional school
to ensure they have a good life.
It's axiomatic in the US that more choice is better. This is a constant
refrain in our political discussion around schools: parental choice,
parental control, options, decisions, permission, matching children to
schools tailored for their needs. Those choices are almost entirely
absent in Finland, at least in Partanen's description, and the amount of
mental and emotional energy this saves is astonishing. Parents simply
don't think about this, and everything is fine.
I think we dramatically underestimate the negative effects of constantly
having to make difficult decisions with significant consequences, and
drastically overstate the benefits of having every aspect of life be full
of major decision points. To let go of that attempt at control, however
illusory, people have to believe in a baseline of quality that makes the
choice less fraught. That's precisely what Finland provides by expecting
high-quality social services and working to fix them when they fall short,
an effort that the United States has by and large abandoned.
A lot of non-fiction books could be turned into long articles without
losing much substance, and I think The Nordic Theory of Everything
falls partly into that trap. Partanen repeats the same ideas from several
different angles, and the book felt a bit padded towards the end. If
you're already familiar with the policy comparisons between the US and
Nordic countries, you will have seen a lot of this before, and the book
bogs down when Partanen strays too far from memoir and personal reactions.
But the focus on individualism and eliminating dependency is new, at least
to me, and is such an illuminating way to look at the contrast that I
think the book is worth reading just for that.
Rating: 7 out of 10
Three Ancient Civilizations
I have been reading books for a long time but somehow I don t know how or when I realized that there are three medieval civilizations that time and again seem to fascinate the authors, either American or European. The three civilizations that do get mentioned every now and then are Egyptian, Greece and Roman and I have no clue as to why. Another two civilizations closely follow them, Mesopotian and Sumerian Civilizations. Why most of the authors irrespective of genre are mystified by these 5 civilizations is beyond me but they also conjure lot of imagination. Now if I was on the right side of 40, maybe 22-25 onwards and had the means or the opportunity or both, I would have gone and tried to learn as much as I can about these various civilizations. There is still a lot of enigma attached and it seems that the official explanations of how these various civilizations ended seems too good to be true or whatever. One of the more interesting points has been how Greece mythology were subverted and flipped to make Roman mythology. Apparently, many Greek gods have a Roman aspect and the qualities are opposite to the Greek aspect. I haven t learned the reason why it is so, yet.
Apart from the Iziko Natural Museum in South Africa which did have its share of wonders (the whole South African experience was surreal) nowhere I have witnessed stuff from the above other than in Africa. The whole thing seemed just surreal. I could go on but as I don t know what is real and what is mythology when it comes to various cultures including whatever little I experienced in Africa.
Recent History
Having said the above, I also find that many Indians somehow do not either know or are not interested in understanding recent history. I am talking of the time between 1800s and 1950 s. British apologist Mr. William Dalrymple in his book Anarchy has shared how the East Indian Company looted India and gave less than half back to the British. So where did the rest of the money go ? It basically went to the tax free havens around UK. That tale is very much similar as to how the Axis gold was used to have an entity called Switzerland from scratch. There have been books as well as couple of miniseries or two that document that fact. But for me this is not the real story, this is more of a side dish per-se.
The real story perhaps is how the EU peace project was born. Because of Hitler s rise and the way World War 2 happened, the Allies knew that they couldn t subjugate Germany through another humiliation as they had after World War 1, who knows another Hitler might come. So while they did fine the Germans, they also helped them via the Marshall plan. Germany also realized that it had been warring with other European countries for over a thousand years as well as quite a few other countries. The first sort of treaty immediately in that regard was the Western Union Alliance . While on the face of it, it was a defense treaty between sovereign powers. Interestingly, as can be seen UK at that point in time wanted more countries to be part of the Treaty of Dunkirk. This was quickly followed 2 years later by the Treaty of London which made way for the Council of Europe to be formed. The reason I am sharing is because a lot of Indians whom I meet on SM do not know that UK was instrumental front and center for the formation of European Union (EU). Also they perhaps don t realize that after World War 2, UK was greatly diminished both financially and militarily. That is the reason it gave back its erstwhile colonies including India and other countries. If UK hadn t become a part of EU they would have been called the poor man of Europe as they had been called few hundred years ago.
In fact, after Brexit, UK has been the only nation that has fallen on the dire times as much as it has. They have practically no food, supermarkets running out of veggies and whatnot. electricity sky-high as they don t want to curtail profits of the gas companies which in turn donate money to Tory coffers. Sadly, most of my brethen do not know that hence I have had to share it. The EU became a power in its own right due to Gravity model of trade. The U.S. is attempting the same thing and calls it near off-shoring.
Tesla, Investor Day Presentation, Toyota and Free Software
The above brings it nicely about the Tesla Investor Day that happened couple of weeks back. The biggest news though was broken just 24 hours earlier when the governor of Monterrey] shared how Giga New Mexico would be happening and shared some site photographs. There have been bits of news on that off-and-on since then. Toyota meanwhile has been putting lot of anti-EV posters and whatnot. In fact, India seems to be mirroring Japan in a lot of ways, the only difference is Japan is still superior economically than India. I do sincerely hope that at least Japan can get out of its lost decades. I have asked privately if any of the Japanese translators would be willing to translate the anti-EV poster so we may all know.
Urinal outside UK Embassy
About a week back few people chanting Khalistan entered the Indian Embassy and showed the flags. This was in the UK. Now in a retaliatory move against a friendly nation, we want to make a Urinal. after making a security downgrade for the Embassy. The pettiness being shown by Indian Govt. especially when it has very few friends. There are also plans to do the same for the U.S. and other embassies as well. I do not know when we will get common sense.
Holi
Just a few weeks back, Holi happened. It used to be a sweet and innocent festival. But from the last few years, I have been hearing and seeing sexual harassment on the rise. In fact, saw quite a few lewd posts written to women on Holi or verge of Holi and also videos of the same. One of the most shameful incidents occurred with a Japanese tourist. She was not only sexually molested but also terrorized that if she were to report then she could be raped and murdered. She promptly left Delhi. Once it was reported in mass media, Delhi Police tried to show it was doing something. FWIW, Delhi s crime against women stats have been at record high and conviction at record low.
Ecommerce Rules being changed, logistics gonna be tough.
Just today there have been changes in Ecommerce rules (again) and this is gonna be a pain for almost all companies big and small with the exception of Adani and Ambani. Almost all players including the Tatas have called them out. Of course, all such laws have been passed without debate. In such a scenario, small startups like these cannot hope to grow their business.
Inertial Measurement Unit or Full Body Tracking with cheap hardware.
Apparently, a whole host of companies are looking at 3-D tracking using cheap hardware apparently known as IMU sensors. Sooner than later cheap 3-D glasses and IMU sensors should explode the 3-D market worldwide. There is a huge potential and upside to it and will probably overtake smartphones as well. But then the danger will be of our thoughts, ideas, nightmares etc. to be shared without our consent. The more we tap into a virtual world, what stops anybody from tapping into our brain and practically stealing our identity in more than one way. I dunno if there are any Debian people or FSF projects working on the above. Even Laws can only do so much, until and unless there are alternative places and ways it would be difficult to say the least
Yesterday, the UBports core developer team released Ubuntu Touch Focal OTA-1
(In fact, Raoul, Marius and I were in a conference call when Marius froze and said: the PR team already posted the release blog post; the post is out, but we haven't released yet... ahhhh... panic... Shall I?, Marius said, and we said: GO!!! This is why the release occurred in public five hours ahead of schedule. OMG.)
For all the details, please study: https://ubports.com/blog/ubports-news-1/post/ubuntu-touch-ota-1-focal-re...
Credits
Thanks to all the developers, other contributors and funding providers that helped to reach this massive milestone. I dare to drop some names here at the risk of forgetting others (I put them in alphanumerical order): Alan, Alfred, Brian, Christoffer, Daniel, Eline, Florian, Guido, Jami, Jonathan, Kugi, Lionel, Maciek, Mardy, Marius, Mike, Nigel, Nikita, Raoul, Ratchanan, Robert, Sergey.
I have been involved in the development and release process over the past four years and I feel honoured to work with so many fine and genuine people on such a unique project.
It is a pleasure to work with you guys!!!
Also a big thanks to the UBports Foundation and its BoD for being the umbrella organisation of all Ubuntu Touch related initiatives.
Consumer-Ready
Ubuntu Touch is one of the very few Open Source projects that brings fourth a 100% FLOSS phone operating system.
After using Ubuntu Touch myself for several months now, I can confirm that it is a consumer grade OS that can be used by non-tech people as a daily driver for mobile communications and connectivity.
Go for it and try it out.
The importance of clean, fresh indoor air is one of the most tangible takeaways of the Covid-19 pandemic. In addition to being an effective risk mitigation strategy for reducing the spread of respiratory illnesses, clean, fresh air is necessary to enable effective cognitive performance.
Monitoring indoor air quality is relatively easy to do, but traditionally has not been a key focus. I believe air quality monitoring should be accessible for any indoor space, and for highly occupied indoor spaces should be provided on a continuous basis. This post explores the need and an opportunity for a business that can accelerate the adoption of ventilation monitoring through the following topics:
The importance of indoor air quality
Clean, fresh air is fundamental to life and health. That might sound obvious, but unfortunately being obvious is not enough to ensure the air we breathe is in fact always clean and healthy. Repeated studies have revealed that in many cases the air you re breathing at school, in the bus or at work and probably also at home falls well below the ideal of what clean, fresh air should be.
Unclean air has potential long-term health impacts and has also been shown to lower cognitive performance impacting the ability to learn and work as well as increasing the risk of transmission of respiratory illnesses like Covid-19 and the flu.
Ventilation (replacing old stale air with clean fresh air) is the most effective and economical method of improving and maintaining high indoor air quality. Most New Zealand buildings (including schools and houses) are designed to rely on manual ventilation (opening windows), while newer buildings, often including larger or commercial buildings may use mechanical ventilation involving fans and ducts. Mechanical ventilation including filtration may also be required in situations where the outdoor air is not clean and fresh such as in a city or next to a busy intersection.
Observing the invisible
Overall air quality is a complex topic involving many contributing factors, many of which are invisible and not perceptible to us until well after adverse effects or irritation occur. This complexity and lack of visible signal is a large contributing factor to the ignorance and lack of attention towards indoor air quality that is prevalent in most buildings and indoor spaces today. Our attention is biased towards the risks that we can see, and this default bias has not been helped by hesitation and resistance to the idea that aerosol transmission and air quality is an important factor in preventing disease transmission that has only recently started to change. Zeynep Tufekci has a great overview that provides fascinating context for how an overreaction to the early incorrect theories of bad air and miasma causing disease contributed to aerosol transmission and air quality being incorrectly neglected for so long.
Correcting this history of inattention to indoor air quality is going to take time and effort, but one significant step that we can take to help start the journey to ensuring all indoor spaces have clean, healthy air is to make the invisible part visible. The concentration of carbon dioxide (CO2) in a space is an incredibly effective and easy to measure proxy for the ventilation of a space. The atmospheric background level of CO2 is around 420 parts per million (ppm), while our exhaled breath has concentrations as high as 40,000 ppm. Without effective ventilation, one or more people breathing in an enclosed space will rapidly lead to an observable increase in CO2 concentration, which in turn provides a signal that the ventilation is insufficient and needs to be improved.
Monitoring CO2 and improving ventilation is not a panacea for all possible air quality issues, but for the majority of buildings and indoor spaces, using CO2 as a proxy for ventilation and increasing ventilation when CO2 levels rise above recommended levels is a simple, effective and achievable approach that will deliver improvements in cognitive performance and reduction in the risk of disease transmission with few, if any, downsides or risks. See this Public Health Communication Centre briefing for a more detailed explanation.
Adding clean air to our hygiene practices
We have well established expectations of hygiene for the food we eat and the water we drink and these expectations are codified in regulations that ensure those providing these services do so in a way that gives us confidence that we re not going to be at risk of illness. You may recall seeing food grade ratings prominently displayed on the walls of restaurants and cafes that you visit as an example of this.
Why should the air we breathe be treated any differently?
I think there is a strong argument that indoor air quality deserves regulation, both of the absolute quality of the air and ensuring that the practices and achieved air quality are clearly advertised and available. Ventilation monitoring via measurement of CO2 concentration provides an effective and achievable method that can be used to achieve this, and countries like Belgium and Japan are already starting to regulate indoor air quality. In the UK, the independent SAGE group of scientists has published Scores on the Doors , a proposal which demonstrates how CO2 monitoring can be helpful in providing information about the air quality of indoor spaces.
Unfortunately there is no movement in any of these directions in New Zealand yet, and no sign that regulation or even a basic campaign to raise awareness of ventilation and air quality is even being planned. This is disappointing, but even if such work was planned, it would still require appropriate ventilation monitoring products and services to enable it, and while there are some options available, it is not a fully solved space yet.
Existing ventilation monitoring options
Until recently the available offerings for ventilation monitoring have sat at two distinct ends of the price and quality spectrum:
Handheld air quality meters advertised as measuring CO2, but in reality reporting only an approximation. These meters do not contain actual CO2 sensors, and only approximate CO2 levels based on measurements of other components of the air. While cheap (often less than $100), these meters are not useful for providing reliable data that can be systematically used to assess and improve ventilation and should be avoided.
High-end building management systems (BMS), and industrial measurement products targeted at large buildings such as offices or commercial applications such as food production. These systems require specialist installation, often integrated with large whole-building air conditioning systems. These systems, if appropriately configured, can be a great solution for the types of buildings and spaces that can afford them, but by their nature and cost, they do not offer a solution for the majority of smaller buildings and indoor spaces where we tend to spend a lot of our time.
Over the last few years a growing number of companies have developed products that fit in between the unreliable air quality meters and the expensive BMS/industrial measurement products. Promising NZ-based options in this space include Air Suite, Tether and Monkeytronics. These products are wall mountable, resemble a smoke alarm and utilise a WiFi network to report their measurements to a supporting web service. Pricing varies between $200 and $300 ex GST per unit.
Aranet, while not NZ based, provides a handheld monitor the Aranet4 Home, which is well regarded for quality and accuracy. Aranet4 Home devices are the most expensive in this space, retailing at $386 ex GST and offer a clunkier and less convenient set of connectivity options via a Bluetooth connection to an associated phone. To obtain similar reporting functionality to the other products requires upgrading to their Pro model and purchasing a separate base station at a combined cost of $1255 ex GST.
Outside the commercial product offerings are a number of open source DIY options, which can be built by anyone with basic electronics knowledge. AirGradient is a leading example based in Thailand, and within New Zealand Oliver Seiler s CO2 Monitor provides similar functionality. These open source options have a parts cost in the $100-$150 range, depending on volume built and provide high-quality measurements via trusted CO2 sensors while also offering huge flexibility in terms of how they operate, interact with users and potential supporting web services.
An opportunity: Small businesses and organisations
While a growing number of high-quality CO2 monitors has the potential to help drive increased adoption of ventilation monitoring, the plethora of small businesses and organisations that own, operate and manage many of the indoor spaces we visit on a day-to-day basis do not appear to be well served by these existing products.
To deploy ventilation monitoring a small business or organisation needs to first become aware of the need or demand for it, and then have a simple and easy path to acquire and install the monitor and access the data. Little to no marketing or demand generation appears to be targeted towards this market from the existing businesses and tellingly, several of the products are not directly available for sale, requiring interaction with a salesperson to purchase. This indicates a focus on selling to larger customers who have a campus or portfolio of buildings and will purchase in larger quantities than the typical small business or organisation will.
Small businesses and organisations are likely to occupy smaller buildings and spaces where manual ventilation is the prevalent method of improving and maintaining air quality. Maintaining clean, fresh air via manual ventilation requires the occupants of the space to receive an obvious and straightforward signal when action (opening windows, etc) is required. While the products above all tend to provide some form of local feedback and display in the room, the indication provided and notification of when to take action is less obvious and prominent than would be ideal in a situation where manual ventilation is being relied upon.
Informally testing this opportunity with family and friends running small businesses over the last few months has resulted in promising feedback. One particular success story was the discovery of a fresh air duct on the air conditioning unit in a small office that had never been connected to the outside air and was simply recirculating air from the ceiling space back into an office! The resulting stuffiness and poor air quality had been noticed, but without the clear indication from the CO2 monitor that the air conditioning was actually making things worse, rather than better, the underlying issue had not been understood. With the issue fixed and the duct now connected, that business is now enjoying much more productive and healthy working conditions.
Next steps
Many small businesses and organisations are likely to have poor air quality and opportunities for improvement similar to the example above that are waiting to be found and fixed, and the existing products available are neither focused or ideal for the needs of this market.
I have spent some time over the past six months building a basic CO2 monitoring service that I have used to deploy ventilation monitoring to our local school, and a few other local businesses. There are a number of challenges that still need to be addressed in order to scale the business up, but I think there is a reasonable chance that I can build a viable business that offers an attractive and useful solution that would accelerate the deployment of ventilation monitoring for small businesses and organisations.
In an upcoming post, I will explain the foundations of the service that I have built to date, the challenges that need to be overcome and how I plan to evolve the service from the current prototype into a sustainable, bootstrapped business.
I ve used hardware-backed OpenPGP keys since 2006 when I imported newly generated rsa1024 subkeys to a FSFE Fellowship card. This worked well for several years, and I recall buying more ZeitControl cards for multi-machine usage and backup purposes. As a side note, I recall being unsatisfied with the weak 1024-bit RSA subkeys at the time my primary key was a somewhat stronger 1280-bit RSA key created back in 2002 but OpenPGP cards at the time didn t support more than 1024 bit RSA, and were (and still often are) also limited to power-of-two RSA key sizes which I dislike.
I had my master key on disk with a strong password for a while, mostly to refresh expiration time of the subkeys and to sign other s OpenPGP keys. At some point I stopped carrying around encrypted copies of my master key. That was my main setup when I migrated to a new stronger RSA 3744 bit key with rsa2048 subkeys on a YubiKey NEO back in 2014. At that point, signing other s OpenPGP keys was a rare enough occurrence that I settled with bringing out my offline machine to perform this operation, transferring the public key to sign on USB sticks. In 2019 I re-evaluated my OpenPGP setup and ended up creating a offline Ed25519 key with subkeys on a FST-01G running Gnuk. My approach for signing other s OpenPGP keys were still to bring out my offline machine and sign things using the master secret using USB sticks for storage and transport. Which meant I almost never did that, because it took too much effort. So my 2019-era Ed25519 key still only has a handful of signatures on it, since I had essentially stopped signing other s keys which is the traditional way of getting signatures in return.
None of this caused any critical problem for me because I continued to use my old 2014-era RSA3744 key in parallel with my new 2019-era Ed25519 key, since too many systems didn t handle Ed25519. However, during 2022 this changed, and the only remaining environment that I still used my RSA3744 key for was in Debian and they require OpenPGP signatures on the new key to allow it to replace an older key. I was in denial about this sub-optimal solution during 2022 and endured its practical consequences, having to use the YubiKey NEO (which I had replaced with a permanently inserted YubiKey Nano at some point) for Debian-related purposes alone.
In December 2022 I bought a new laptop and setup a FST-01SZ with my Ed25519 key, and while I have taken a vacation from Debian, I continue to extend the expiration period on the old RSA3744-key in case I will ever have to use it again, so the overall OpenPGP setup was still sub-optimal. Having two valid OpenPGP keys at the same time causes people to use both for email encryption (leading me to have to use both devices), and the WKD Key Discovery protocol doesn t like two valid keys either. At FOSDEM 23 I ran into Andre Heinecke at GnuPG and I couldn t help complain about how complex and unsatisfying all OpenPGP-related matters were, and he mildly ignored my rant and asked why I didn t put the master key on another smartcard. The comment sunk in when I came home, and recently I connected all the dots and this post is a summary of what I did to move my offline OpenPGP master key to a Nitrokey Start.
First a word about device choice, I still prefer to use hardware devices that are as compatible with free software as possible, but the FST-01G or FST-01SZ are no longer easily available for purchase. I got a comment about Nitrokey start in my last post, and had two of them available to experiment with. There are things to dislike with the Nitrokey Start compared to the YubiKey (e.g., relative insecure chip architecture, the bulkier form factor and lack of FIDO/U2F/OATH support) but as far as I know there is no more widely available owner-controlled device that is manufactured for an intended purpose of implementing an OpenPGP card. Thus it hits the sweet spot for me.
The first step is to run latest firmware on the Nitrokey Start for bug-fixes and important OpenSSH 9.0 compatibility and there are reproducible-built firmware published that you can install using pynitrokey. I run Trisquel 11 aramo on my laptop, which does not include the Python Pip package (likely because it promotes installing non-free software) so that was a slight complication. Building the firmware locally may have worked, and I would like to do that eventually to confirm the published firmware, however to save time I settled with installing the Ubuntu 22.04 packages on my machine:
$ sha256sum python3-pip*
ded6b3867a4a4cbaff0940cab366975d6aeecc76b9f2d2efa3deceb062668b1c python3-pip_22.0.2+dfsg-1ubuntu0.2_all.deb
e1561575130c41dc3309023a345de337e84b4b04c21c74db57f599e267114325 python3-pip-whl_22.0.2+dfsg-1ubuntu0.2_all.deb
$ doas dpkg -i python3-pip*
...
$ doas apt install -f
...
$
Installing pynitrokey downloaded a bunch of dependencies, and it would be nice to audit the license and security vulnerabilities for each of them. (Verbose output below slightly redacted.)
jas@kaka:~$ pip3 install --user pynitrokey
Collecting pynitrokey
Downloading pynitrokey-0.4.34-py3-none-any.whl (572 kB)
Collecting frozendict~=2.3.4
Downloading frozendict-2.3.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (113 kB)
Requirement already satisfied: click<9,>=8.0.0 in /usr/lib/python3/dist-packages (from pynitrokey) (8.0.3)
Collecting ecdsa
Downloading ecdsa-0.18.0-py2.py3-none-any.whl (142 kB)
Collecting python-dateutil~=2.7.0
Downloading python_dateutil-2.7.5-py2.py3-none-any.whl (225 kB)
Collecting fido2<2,>=1.1.0
Downloading fido2-1.1.0-py3-none-any.whl (201 kB)
Collecting tlv8
Downloading tlv8-0.10.0.tar.gz (16 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: certifi>=14.5.14 in /usr/lib/python3/dist-packages (from pynitrokey) (2020.6.20)
Requirement already satisfied: pyusb in /usr/lib/python3/dist-packages (from pynitrokey) (1.2.1.post1)
Collecting urllib3~=1.26.7
Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting spsdk<1.8.0,>=1.7.0
Downloading spsdk-1.7.1-py3-none-any.whl (684 kB)
Collecting typing_extensions~=4.3.0
Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Requirement already satisfied: cryptography<37,>=3.4.4 in /usr/lib/python3/dist-packages (from pynitrokey) (3.4.8)
Collecting intelhex
Downloading intelhex-2.3.0-py2.py3-none-any.whl (50 kB)
Collecting nkdfu
Downloading nkdfu-0.2-py3-none-any.whl (16 kB)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pynitrokey) (2.25.1)
Collecting tqdm
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting nrfutil<7,>=6.1.4
Downloading nrfutil-6.1.7.tar.gz (845 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: cffi in /usr/lib/python3/dist-packages (from pynitrokey) (1.15.0)
Collecting crcmod
Downloading crcmod-1.7.tar.gz (89 kB)
Preparing metadata (setup.py) ... done
Collecting libusb1==1.9.3
Downloading libusb1-1.9.3-py3-none-any.whl (60 kB)
Collecting pc_ble_driver_py>=0.16.4
Downloading pc_ble_driver_py-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting piccata
Downloading piccata-2.0.3-py3-none-any.whl (21 kB)
Collecting protobuf<4.0.0,>=3.17.3
Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
Collecting pyserial
Downloading pyserial-3.5-py2.py3-none-any.whl (90 kB)
Collecting pyspinel>=1.0.0a3
Downloading pyspinel-1.0.3.tar.gz (58 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from nrfutil<7,>=6.1.4->pynitrokey) (5.4.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil~=2.7.0->pynitrokey) (1.16.0)
Collecting pylink-square<0.11.9,>=0.8.2
Downloading pylink_square-0.11.1-py2.py3-none-any.whl (78 kB)
Collecting jinja2<3.1,>=2.11
Downloading Jinja2-3.0.3-py3-none-any.whl (133 kB)
Collecting bincopy<17.11,>=17.10.2
Downloading bincopy-17.10.3-py3-none-any.whl (17 kB)
Collecting fastjsonschema>=2.15.1
Downloading fastjsonschema-2.16.3-py3-none-any.whl (23 kB)
Collecting astunparse<2,>=1.6
Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting oscrypto~=1.2
Downloading oscrypto-1.3.0-py2.py3-none-any.whl (194 kB)
Collecting deepmerge==0.3.0
Downloading deepmerge-0.3.0-py2.py3-none-any.whl (7.6 kB)
Collecting pyocd<=0.31.0,>=0.28.3
Downloading pyocd-0.31.0-py3-none-any.whl (12.5 MB)
Collecting click-option-group<0.6,>=0.3.0
Downloading click_option_group-0.5.5-py3-none-any.whl (12 kB)
Collecting pycryptodome<4,>=3.9.3
Downloading pycryptodome-3.17-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Collecting pyocd-pemicro<1.2.0,>=1.1.1
Downloading pyocd_pemicro-1.1.5-py3-none-any.whl (9.0 kB)
Requirement already satisfied: colorama<1,>=0.4.4 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (0.4.4)
Collecting commentjson<1,>=0.9
Downloading commentjson-0.9.0.tar.gz (8.7 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: asn1crypto<2,>=1.2 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.0)
Collecting pypemicro<0.2.0,>=0.1.9
Downloading pypemicro-0.1.11-py3-none-any.whl (5.7 MB)
Collecting libusbsio>=2.1.11
Downloading libusbsio-2.1.11-py3-none-any.whl (247 kB)
Collecting sly==0.4
Downloading sly-0.4.tar.gz (60 kB)
Preparing metadata (setup.py) ... done
Collecting ruamel.yaml<0.18.0,>=0.17
Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting cmsis-pack-manager<0.3.0
Downloading cmsis_pack_manager-0.2.10-py2.py3-none-manylinux1_x86_64.whl (25.1 MB)
Collecting click-command-tree==1.1.0
Downloading click_command_tree-1.1.0-py3-none-any.whl (3.6 kB)
Requirement already satisfied: bitstring<3.2,>=3.1 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (3.1.7)
Collecting hexdump~=3.3
Downloading hexdump-3.3.zip (12 kB)
Preparing metadata (setup.py) ... done
Collecting fire
Downloading fire-0.5.0.tar.gz (88 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse<2,>=1.6->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.37.1)
Collecting humanfriendly
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting argparse-addons>=0.4.0
Downloading argparse_addons-0.12.0-py3-none-any.whl (3.3 kB)
Collecting pyelftools
Downloading pyelftools-0.29-py2.py3-none-any.whl (174 kB)
Collecting milksnake>=0.1.2
Downloading milksnake-0.1.5-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: appdirs>=1.4 in /usr/lib/python3/dist-packages (from cmsis-pack-manager<0.3.0->spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.4)
Collecting lark-parser<0.8.0,>=0.7.1
Downloading lark-parser-0.7.8.tar.gz (276 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2<3.1,>=2.11->spsdk<1.8.0,>=1.7.0->pynitrokey) (2.0.1)
Collecting asn1crypto<2,>=1.2
Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
Collecting wrapt
Downloading wrapt-1.15.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78 kB)
Collecting future
Downloading future-0.18.3.tar.gz (840 kB)
Preparing metadata (setup.py) ... done
Collecting psutil>=5.2.2
Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
Collecting capstone<5.0,>=4.0
Downloading capstone-4.0.2-py2.py3-none-manylinux1_x86_64.whl (2.1 MB)
Collecting naturalsort<2.0,>=1.5
Downloading naturalsort-1.5.1.tar.gz (7.4 kB)
Preparing metadata (setup.py) ... done
Collecting prettytable<3.0,>=2.0
Downloading prettytable-2.5.0-py3-none-any.whl (24 kB)
Collecting intervaltree<4.0,>=3.0.2
Downloading intervaltree-3.1.0.tar.gz (32 kB)
Preparing metadata (setup.py) ... done
Collecting ruamel.yaml.clib>=0.2.6
Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)
Collecting termcolor
Downloading termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting sortedcontainers<3.0,>=2.0
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: wcwidth in /usr/lib/python3/dist-packages (from prettytable<3.0,>=2.0->pyocd<=0.31.0,>=0.28.3->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.2.5)
Building wheels for collected packages: nrfutil, crcmod, sly, tlv8, commentjson, hexdump, pyspinel, fire, intervaltree, lark-parser, naturalsort, future
Building wheel for nrfutil (setup.py) ... done
Created wheel for nrfutil: filename=nrfutil-6.1.7-py3-none-any.whl size=898520 sha256=de6f8803f51d6c26d24dc7df6292064a468ff3f389d73370433fde5582b84a10
Stored in directory: /home/jas/.cache/pip/wheels/39/2b/9b/98ab2dd716da746290e6728bdb557b14c1c9a54cb9ed86e13b
Building wheel for crcmod (setup.py) ... done
Created wheel for crcmod: filename=crcmod-1.7-cp310-cp310-linux_x86_64.whl size=31422 sha256=5149ac56fcbfa0606760eef5220fcedc66be560adf68cf38c604af3ad0e4a8b0
Stored in directory: /home/jas/.cache/pip/wheels/85/4c/07/72215c529bd59d67e3dac29711d7aba1b692f543c808ba9e86
Building wheel for sly (setup.py) ... done
Created wheel for sly: filename=sly-0.4-py3-none-any.whl size=27352 sha256=f614e413918de45c73d1e9a8dca61ca07dc760d9740553400efc234c891f7fde
Stored in directory: /home/jas/.cache/pip/wheels/a2/23/4a/6a84282a0d2c29f003012dc565b3126e427972e8b8157ea51f
Building wheel for tlv8 (setup.py) ... done
Created wheel for tlv8: filename=tlv8-0.10.0-py3-none-any.whl size=11266 sha256=3ec8b3c45977a3addbc66b7b99e1d81b146607c3a269502b9b5651900a0e2d08
Stored in directory: /home/jas/.cache/pip/wheels/e9/35/86/66a473cc2abb0c7f21ed39c30a3b2219b16bd2cdb4b33cfc2c
Building wheel for commentjson (setup.py) ... done
Created wheel for commentjson: filename=commentjson-0.9.0-py3-none-any.whl size=12092 sha256=28b6413132d6d7798a18cf8c76885dc69f676ea763ffcb08775a3c2c43444f4a
Stored in directory: /home/jas/.cache/pip/wheels/7d/90/23/6358a234ca5b4ec0866d447079b97fedf9883387d1d7d074e5
Building wheel for hexdump (setup.py) ... done
Created wheel for hexdump: filename=hexdump-3.3-py3-none-any.whl size=8913 sha256=79dfadd42edbc9acaeac1987464f2df4053784fff18b96408c1309b74fd09f50
Stored in directory: /home/jas/.cache/pip/wheels/26/28/f7/f47d7ecd9ae44c4457e72c8bb617ef18ab332ee2b2a1047e87
Building wheel for pyspinel (setup.py) ... done
Created wheel for pyspinel: filename=pyspinel-1.0.3-py3-none-any.whl size=65033 sha256=01dc27f81f28b4830a0cf2336dc737ef309a1287fcf33f57a8a4c5bed3b5f0a6
Stored in directory: /home/jas/.cache/pip/wheels/95/ec/4b/6e3e2ee18e7292d26a65659f75d07411a6e69158bb05507590
Building wheel for fire (setup.py) ... done
Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116951 sha256=3d288585478c91a6914629eb739ea789828eb2d0267febc7c5390cb24ba153e8
Stored in directory: /home/jas/.cache/pip/wheels/90/d4/f7/9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
Building wheel for intervaltree (setup.py) ... done
Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26119 sha256=5ff1def22ba883af25c90d90ef7c6518496fcd47dd2cbc53a57ec04cd60dc21d
Stored in directory: /home/jas/.cache/pip/wheels/fa/80/8c/43488a924a046b733b64de3fac99252674c892a4c3801c0a61
Building wheel for lark-parser (setup.py) ... done
Created wheel for lark-parser: filename=lark_parser-0.7.8-py2.py3-none-any.whl size=62527 sha256=3d2ec1d0f926fc2688d40777f7ef93c9986f874169132b1af590b6afc038f4be
Stored in directory: /home/jas/.cache/pip/wheels/29/30/94/33e8b58318aa05cb1842b365843036e0280af5983abb966b83
Building wheel for naturalsort (setup.py) ... done
Created wheel for naturalsort: filename=naturalsort-1.5.1-py3-none-any.whl size=7526 sha256=bdecac4a49f2416924548cae6c124c85d5333e9e61c563232678ed182969d453
Stored in directory: /home/jas/.cache/pip/wheels/a6/8e/c9/98cfa614fff2979b457fa2d9ad45ec85fa417e7e3e2e43be51
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492037 sha256=57a01e68feca2b5563f5f624141267f399082d2f05f55886f71b5d6e6cf2b02c
Stored in directory: /home/jas/.cache/pip/wheels/5e/a9/47/f118e66afd12240e4662752cc22cefae5d97275623aa8ef57d
Successfully built nrfutil crcmod sly tlv8 commentjson hexdump pyspinel fire intervaltree lark-parser naturalsort future
Installing collected packages: tlv8, sortedcontainers, sly, pyserial, pyelftools, piccata, naturalsort, libusb1, lark-parser, intelhex, hexdump, fastjsonschema, crcmod, asn1crypto, wrapt, urllib3, typing_extensions, tqdm, termcolor, ruamel.yaml.clib, python-dateutil, pyspinel, pypemicro, pycryptodome, psutil, protobuf, prettytable, oscrypto, milksnake, libusbsio, jinja2, intervaltree, humanfriendly, future, frozendict, fido2, ecdsa, deepmerge, commentjson, click-option-group, click-command-tree, capstone, astunparse, argparse-addons, ruamel.yaml, pyocd-pemicro, pylink-square, pc_ble_driver_py, fire, cmsis-pack-manager, bincopy, pyocd, nrfutil, nkdfu, spsdk, pynitrokey
WARNING: The script nitropy is installed in '/home/jas/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed argparse-addons-0.12.0 asn1crypto-1.5.1 astunparse-1.6.3 bincopy-17.10.3 capstone-4.0.2 click-command-tree-1.1.0 click-option-group-0.5.5 cmsis-pack-manager-0.2.10 commentjson-0.9.0 crcmod-1.7 deepmerge-0.3.0 ecdsa-0.18.0 fastjsonschema-2.16.3 fido2-1.1.0 fire-0.5.0 frozendict-2.3.5 future-0.18.3 hexdump-3.3 humanfriendly-10.0 intelhex-2.3.0 intervaltree-3.1.0 jinja2-3.0.3 lark-parser-0.7.8 libusb1-1.9.3 libusbsio-2.1.11 milksnake-0.1.5 naturalsort-1.5.1 nkdfu-0.2 nrfutil-6.1.7 oscrypto-1.3.0 pc_ble_driver_py-0.17.0 piccata-2.0.3 prettytable-2.5.0 protobuf-3.20.3 psutil-5.9.4 pycryptodome-3.17 pyelftools-0.29 pylink-square-0.11.1 pynitrokey-0.4.34 pyocd-0.31.0 pyocd-pemicro-1.1.5 pypemicro-0.1.11 pyserial-3.5 pyspinel-1.0.3 python-dateutil-2.7.5 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 sly-0.4 sortedcontainers-2.4.0 spsdk-1.7.1 termcolor-2.2.0 tlv8-0.10.0 tqdm-4.65.0 typing_extensions-4.3.0 urllib3-1.26.15 wrapt-1.15.0
jas@kaka:~$
Then upgrading the device worked remarkable well, although I wish that the tool would have printed URLs and checksums for the firmware files to allow easy confirmation.
jas@kaka:~$ PATH=$PATH:/home/jas/.local/bin
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.15-5D271572: Nitrokey Nitrokey Start (RTM.12.1-RC2-modified)
jas@kaka:~$ nitropy start update
Command line tool to interact with Nitrokey devices 0.4.34
Nitrokey Start firmware update tool
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
System: Linux, is_linux: True
Python: 3.10.6
Saving run log to: /tmp/nitropy.log.gc5753a8
Admin PIN:
Firmware data to be used:
- FirmwareType.REGNUAL: 4408, hash: ...b'72a30389' valid (from ...built/RTM.13/regnual.bin)
- FirmwareType.GNUK: 129024, hash: ...b'25a4289b' valid (from ...prebuilt/RTM.13/gnuk.bin)
Currently connected device strings:
Device:
Vendor: Nitrokey
Product: Nitrokey Start
Serial: FSIJ-1.2.15-5D271572
Revision: RTM.12.1-RC2-modified
Config: *:*:8e82
Sys: 3.0
Board: NITROKEY-START-G
initial device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.15-5D271572', 'Revision': 'RTM.12.1-RC2-modified', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
Please note:
- Latest firmware available is:
RTM.13 (published: 2022-12-08T10:59:11Z)
- provided firmware: None
- all data will be removed from the device!
- do not interrupt update process - the device may not run properly!
- the process should not take more than 1 minute
Do you want to continue? [yes/no]: yes
...
Starting bootloader upload procedure
Device: Nitrokey Start FSIJ-1.2.15-5D271572
Connected to the device
Running update!
Do NOT remove the device from the USB slot, until further notice
Downloading flash upgrade program...
Executing flash upgrade...
Waiting for device to appear:
Wait 20 seconds.....
Downloading the program
Protecting device
Finish flashing
Resetting device
Update procedure finished. Device could be removed from USB slot.
Currently connected device strings (after upgrade):
Device:
Vendor: Nitrokey
Product: Nitrokey Start
Serial: FSIJ-1.2.19-5D271572
Revision: RTM.13
Config: *:*:8e82
Sys: 3.0
Board: NITROKEY-START-G
device can now be safely removed from the USB slot
final device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.19-5D271572', 'Revision': 'RTM.13', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
finishing session 2023-03-16 21:49:07.371291
Log saved to: /tmp/nitropy.log.gc5753a8
jas@kaka:~$
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.19-5D271572: Nitrokey Nitrokey Start (RTM.13)
jas@kaka:~$
Before importing the master key to this device, it should be configured. Note the commands in the beginning to make sure scdaemon/pcscd is not running because they may have cached state from earlier cards. Change PIN code as you like after this, my experience with Gnuk was that the Admin PIN had to be changed first, then you import the key, and then you change the PIN.
jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
OK
ERR 67125247 Slut p fil <GPG Agent>
jas@kaka:~$ ps auxww grep -e pcsc -e scd
jas 11651 0.0 0.0 3468 1672 pts/0 R+ 21:54 0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......:
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card> admin
Admin commands are allowed
gpg/card> kdf-setup
gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE5D2715720000 detected
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 3
PIN changed.
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? q
gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon
gpg/card> lang
Language preferences: sv
gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m
gpg/card> login
Login data (account name): jas
gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt
gpg/card> forcesig
gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
If the key generation does not succeed, please check the
documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
gpg/card>
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
jas@kaka:~$
Once setup, bring out your offline machine and boot it and mount your USB stick with the offline key. The paths below will be different, and this is using a somewhat unorthodox approach of working with fresh GnuPG configuration paths that I chose for the USB stick.
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ cp -a gnupghome-backup-masterkey gnupghome-import-nitrokey-5D271572
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ gpg --homedir $PWD/gnupghome-import-nitrokey-5D271572 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Secret key is available.
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Really move the primary key? (y/N) y
Please select where to store the key:
(1) Signature key
(3) Authentication key
Your selection? 1
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg>
Save changes? (y/N) y
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$
At this point it is useful to confirm that the Nitrokey has the master key available and that is possible to sign statements with it, back on your regular machine:
jas@kaka:~$ gpg --card-status
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 1
KDF setting ......: on
Signature key ....: B1D2 BD13 75BE CB78 4CF4 F8C4 D73C F638 C53C 06BE
created ....: 2019-03-20 23:37:24
Encryption key....: [none]
Authentication key: [none]
General key info..: pub ed25519/D73CF638C53C06BE 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec> ed25519/D73CF638C53C06BE created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 5D271572
ssb> ed25519/80260EE8A9B92B2B created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> ed25519/51722B08FE4745A2 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> cv25519/02923D7EE76EBD60 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
jas@kaka:~$ echo foo gpg -a --sign gpg --verify
gpg: Signature made Thu Mar 16 22:11:02 2023 CET
gpg: using EDDSA key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$
Finally to retrieve and sign a key, for example Andre Heinecke s that I could confirm the OpenPGP key identifier from his business card.
jas@kaka:~$ gpg --locate-external-keys aheinecke@gnupg.com
gpg: key 1FDF723CF462B6B1: public key "Andre Heinecke <aheinecke@gnupg.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 2 signed: 7 trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1 valid: 7 signed: 64 trust: 7-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2023-05-26
pub rsa3072 2015-12-08 [SC] [expires: 2025-12-05]
94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1
uid [ unknown] Andre Heinecke <aheinecke@gnupg.com>
sub ed25519 2017-02-13 [S]
sub ed25519 2017-02-13 [A]
sub rsa3072 2015-12-08 [E] [expires: 2025-12-05]
sub rsa3072 2015-12-08 [A] [expires: 2025-12-05]
jas@kaka:~$ gpg --edit-key "94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1"
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
pub rsa3072/1FDF723CF462B6B1
created: 2015-12-08 expires: 2025-12-05 usage: SC
trust: unknown validity: unknown
sub ed25519/2978E9D40CBABA5C
created: 2017-02-13 expires: never usage: S
sub ed25519/DC74D901C8E2DD47
created: 2017-02-13 expires: never usage: A
The following key was revoked on 2017-02-23 by RSA key 1FDF723CF462B6B1 Andre Heinecke <aheinecke@gnupg.com>
sub cv25519/1FFE3151683260AB
created: 2017-02-13 revoked: 2017-02-23 usage: E
sub rsa3072/8CC999BDAA45C71F
created: 2015-12-08 expires: 2025-12-05 usage: E
sub rsa3072/6304A4B539CE444A
created: 2015-12-08 expires: 2025-12-05 usage: A
[ unknown] (1). Andre Heinecke <aheinecke@gnupg.com>
gpg> sign
pub rsa3072/1FDF723CF462B6B1
created: 2015-12-08 expires: 2025-12-05 usage: SC
trust: unknown validity: unknown
Primary key fingerprint: 94A5 C9A0 3C2F E5CA 3B09 5D8E 1FDF 723C F462 B6B1
Andre Heinecke <aheinecke@gnupg.com>
This key is due to expire on 2025-12-05.
Are you sure that you want to sign this key with your
key "Simon Josefsson <simon@josefsson.org>" (D73CF638C53C06BE)
Really sign? (y/N) y
gpg> quit
Save changes? (y/N) y
jas@kaka:~$
This is on my day-to-day machine, using the NitroKey Start with the offline key. No need to boot the old offline machine just to sign keys or extend expiry anymore! At FOSDEM 23 I managed to get at least one DD signature on my new key, and the Debian keyring maintainers accepted my Ed25519 key. Hopefully I can now finally let my 2014-era RSA3744 key expire in 2023-09-19 and not extend it any further. This should finish my transition to a simpler OpenPGP key setup, yay!
The Framework is a 13.5" laptop body with swappable parts, which
makes it somewhat future-proof and certainly easily repairable,
scoring an "exceedingly rare" 10/10 score from ifixit.com.
There are two generations of the laptop's main board (both compatible
with the same body): the Intel 11th and 12th gen chipsets.
I have received my Framework, 12th generation "DIY", device in late
September 2022 and will update this page as I go along in the process
of ordering, burning-in, setting up and using the device over the
years.
Overall, the Framework is a good laptop. I like the keyboard, the
touch pad, the expansion cards. Clearly there's been some good work
done on industrial design, and it's the most repairable laptop I've
had in years. Time will tell, but it looks sturdy enough to survive me
many years as well.
This is also one of the most powerful devices I ever lay my hands
on. I have managed, remotely, more powerful servers, but this is the
fastest computer I have ever owned, and it fits in this tiny case. It
is an amazing machine.
On the downside, there's a bit of proprietary firmware required (WiFi,
Bluetooth, some graphics) and the Framework ships with a proprietary
BIOS, with currently no Coreboot support. Expect to need the
latest kernel, firmware, and hacking around a bunch of things to get
resolution and keybindings working right.
Like others, I have first found significant power management issues,
but many issues can actually be solved with some configuration. Some
of the expansion ports (HDMI, DP, MicroSD, and SSD) use power when
idle, so don't expect week-long suspend, or "full day" battery while
those are plugged in.
Finally, the expansion ports are nice, but there's only four of
them. If you plan to have a two-monitor setup, you're likely going to
need a dock.
Read on for the detailed review. For context, I'm moving from the
Purism Librem 13v4 because it
basically exploded on me. I
had, in the meantime, reverted back to an old ThinkPad X220, so I
sometimes compare the Framework with that venerable laptop as well.
This blog post has been maturing for months now. It started in
September 2022 and I declared it completed in March 2023. It's the
longest single article on this entire website, currently clocking at
about 13,000 words. It will take an average reader a full hour to go
through this thing, so I don't expect anyone to actually do
that. This introduction should be good enough for most people, read
the first section if you intend to actually buy a Framework. Jump
around the table of contents as you see fit for after you did buy the
laptop, as it might include some crucial hints on how to make it work
best for you, especially on (Debian) Linux.
Advice for buyers
Those are things I wish I would have known before buying:
consider buying 4 USB-C expansion cards, or at least a mix of 4
USB-A or USB-C cards, as they use less power than other cards and
you do want to fill those expansion slots otherwise they snag
around and feel insecure
you will likely need a dock or at least a USB hub if you want a
two-monitor setup, otherwise you'll run out of ports
you have to do some serious tuning to get proper (10h+ idle, 10
days suspend) power savings
in particular, beware that the HDMI, DisplayPort and
particularly the SSD and MicroSD cards take a significant amount
power, even when sleeping, up to 2-6W for the latter two
beware that the MicroSD card is what it says: Micro, normal SD
cards won't fit, and while there might be full sized one
eventually, it's currently only at the prototyping stage
Current status
I have the framework! It's setup with a fresh new Debian bookworm
installation. I've ran through a large number of tests and burn in.
I have decided to use the Framework as my daily driver, and had to buy
a USB-C dock to get my two monitors
connected, which was own adventure.
Update: Framework just (2023-03-23) just announced a whole bunch of
new stuff:
The recording is available in this video and it's not your
typical keynote. It starts ~25 minutes late, audio is crap, lightning
and camera are crap, clapping seems to be from whatever staff they
managed to get together in a room, decor is bizarre, colors are
shit. It's amazing.
Specifications
Those are the specifications of the 12th gen, in general terms. Your
build will of course vary according to your needs.
CPU: i5-1240P, i7-1260P, or i7-1280P (Up to 4.4-4.8 GHz, 4+8
cores), Iris Xe graphics
4 x USB-C user-selectable expansion ports, including
USB-C
USB-A
HDMI
DP
Ethernet
MicroSD
250-1000GB SSD
3.5mm combo headphone jack
Kill switches for microphone and camera
Battery: 55Wh
Camera: 1080p 60fps
Biometrics: Fingerprint Reader
Backlit keyboard
Power Adapter: 60W USB-C (or bring your own)
ships with a screwdriver/spludger
1 year warranty
base price: 1000$CAD, but doesn't give you much, typical builds
around 1500-2000$CAD
Actual build
This is the actual build I ordered. Amounts in CAD. (1CAD =
~0.75EUR/USD.)
Base configuration
CPU: Intel Core i5-1240P (AKA Alder Lake P 8 4.4GHz
P-threads, 8 3.2GHz E-threads, 16 total, 28-64W), 1079$
Memory: 16GB (1 x 16GB) DDR4-3200, 104$
Customization
Keyboard: US English, included
Expansion Cards
2 USB-C $24
3 USB-A $36
2 HDMI $50
1 DP $50
1 MicroSD $25
1 Storage 1TB $199
Sub-total: 384$
Accessories
Power Adapter - US/Canada $64.00
Total
Before tax: 1606$
After tax and duties: 1847$
Free shipping
Quick evaluation
This is basically the TL;DR: here, just focusing on broad pros/cons of
the laptop.
Pros
easily repairable (complete with QR codes pointing to repair
guides!), the 11th gen received a 10/10 score from
ifixit.com, which they call "exceedingly rare", the 12th gen
has a similar hardware design and would probably rate similarly
replaceable motherboard!!! can be reused as a NUC-like device, with a
3d-printed case, 12th gen board can be bought standalone and
retrofitted into an 11th gen case
not a passing fad: they made a first laptop with the 11th gen Intel
chipset in 2021, and a second motherboard with the 12th Intel
chipset in 2022
four modular USB-C ports which can fit HDMI, USB-C (pass-through,
can provide power on both sides), USB-A, DisplayPort, MicroSD,
external storage (250GB, 1TB), active modding community
nice power led indicating power level (charging, charged, etc) when
plugged
they used to have some difficulty keeping up with the orders: first
two batches shipped, third batch sold out, fourth batch should have
shipped in October 2021. they generally seem to keep up with
shipping. update (august 2022): they rolled out a second line of
laptops (12th gen), first batch shipped, second batch shipped
late, September 2022 batch was generally on time, see this
spreadsheet for a crowdsourced effort to track those
supply chain issues seem to be under control as of early 2023. I
got the Ethernet expansion card shipped within a week.
compared to my previous laptop (Purism Librem
13v4), it feels strangely
bulkier and heavier; it's actually lighter than the purism (1.3kg
vs 1.4kg) and thinner (15.85mm vs 18mm) but the design of the
Purism laptop (tapered edges) makes it feel thinner
no space for a 2.5" drive
rather bright LED around power button, but can be dimmed in the
BIOS (not low enough to my taste) I got used to it
fan quiet when idle, but can be noisy when running, for example if
you max a CPU for a while
battery described as "mediocre" by Ars Technica (above), confirmed
poor in my tests (see below)
no RJ-45 port, and attempts at designing ones are failing
because the modular plugs are too thin to fit (according to Linux
After Dark), so unlikely to have one in the future
Update: they cracked that nut and ship an 2.5 gbps Ethernet
expansion card with a realtek chipset, without any
firmware blob
a bit pricey for the performance, especially when compared to the
competition (e.g. Dell XPS, Apple M1)
12th gen Intel has glitchy graphics, seems like Intel hasn't fully
landed proper Linux support for that chipset yet
Initial hardware setup
A breeze.
Accessing the board
The internals are accessed through five TorX screws, but there's a nice
screwdriver/spudger that works well enough. The screws actually hold in
place so you can't even lose them.
The first setup is a bit counter-intuitive coming from the Librem
laptop, as I expected the back cover to lift and give me access to the
internals. But instead the screws is release the keyboard and touch
pad assembly, so you actually need to flip the laptop back upright and
lift the assembly off to get access to the internals. Kind of
scary.
I also actually unplugged a connector in lifting the assembly because
I lifted it towards the monitor, while you actually need to lift it
to the right. Thankfully, the connector didn't break, it just
snapped off and I could plug it back in, no harm done.
Once there, everything is well indicated, with QR codes all over the
place supposedly leading to online instructions.
Bad QR codes
Unfortunately, the QR codes I tested (in the expansion card slot, the
memory slot and CPU slots) did not actually work so I wonder how
useful those actually are.
After all, they need to point to something and that means a URL, a
running website that will answer those requests forever. I bet those
will break sooner than later and in fact, as far as I can tell, they
just don't work at all. I prefer the approach taken by the MNT reform
here which designed (with the 100 rabbits folks) an actual paper
handbook (PDF).
The first QR code that's immediately visible from the back of the
laptop, in an expansion cord slot, is a 404. It seems to be some
serial number URL, but I can't actually tell because, well, the page
is a 404.
I was expecting that bar code to lead me to an introduction page,
something like "how to setup your Framework laptop". Support actually
confirmed that it should point a quickstart guide. But in a
bizarre twist, they somehow sent me the URL with the plus (+) signs
escaped, like this:
(They have also "let the team know about this for feedback and help
resolve the problem with the link" which is a support code word for
"ha-ha! nope! not my problem right now!" Trust me, I know, my own
code word is "can you please make a ticket?")
Seating disks and memory
The "DIY" kit doesn't actually have that much of a setup. If you
bought RAM, it's shipped outside the laptop in a little plastic case,
so you just seat it in as usual.
Then you insert your NVMe drive, and, if that's your fancy, you also
install your own mPCI WiFi card. If you ordered one (which was my
case), it's pre-installed.
Closing the laptop is also kind of amazing, because the keyboard
assembly snaps into place with magnets. I have actually used the
laptop with the keyboard unscrewed as I was putting the drives in and
out, and it actually works fine (and will probably void your warranty,
so don't do that). (But you can.) (But don't, really.)
Hardware review
Keyboard and touch pad
The keyboard feels nice, for a laptop. I'm used to mechanical keyboard
and I'm rather violent with those poor things. Yet the key travel is
nice and it's clickety enough that I don't feel too disoriented.
At first, I felt the keyboard as being more laggy than my normal
workstation setup, but it turned out this was a graphics driver
issues. After enabling a composition manager, everything feels snappy.
The touch pad feels good. The double-finger scroll works well enough,
and I don't have to wonder too much where the middle button is, it
just works.
Taps don't work, out of the box: that needs to be enabled in Xorg,
with something like this:
But be aware that once you enable that tapping, you'll need to deal
with palm detection... So I have not actually enabled this in the end.
Power button
The power button is a little dangerous. It's quite easy to hit, as
it's right next to one expansion card where you are likely to plug in
a cable power. And because the expansion cards are kind of hard to
remove, you might squeeze the laptop (and the power key) when trying
to remove the expansion card next to the power button.
So obviously, don't do that. But that's not very helpful.
An alternative is to make the power button do something else. With
systemd-managed systems, it's actually quite easy. Add a
HandlePowerKey stanza to (say)
/etc/systemd/logind.conf.d/power-suspends.conf:
And the power button will suspend! Long-press to power off doesn't
actually work as the laptop immediately suspends...
Note that there's probably half a dozen other ways of doing this,
see this, this, or that.
Special keybindings
There is a series of "hidden" (as in: not labeled on the key)
keybindings related to the fn keybinding that I actually
find quite useful.
Key
Equivalent
Effect
Command
p
Pause
lock screen
xset s activate
b
Break
?
?
k
ScrLk
switch keyboard layout
N/A
It looks like those are defined in the microcontroller so it
would be possible to add some. For example, the SysRq key
is almost bound to fns in there.
Note that most other shortcuts like this are clearly documented
(volume, brightness, etc). One key that's less obvious is
F12 that only has the Framework logo on it. That actually
calls the keysym XF86AudioMedia which, interestingly, does
absolutely nothing here. By default, on Windows, it opens your
browser to the Framework website and, on Linux, your "default
media player".
The keyboard backlight can be cycled with fn-space. The
dimmer version is dim enough, and the keybinding is easy to find in
the dark.
A skinny elephant would be performed with altPrtScr (above F11) KEY, so for
example altfnF11b
should do a hard reset. This comment suggests you need to hold
the fnonly if "function lock" is on, but that's
actually the opposite of my experience.
Out of the box, some of the fn keys don't work. Mute,
volume up/down, brightness, monitor changes, and the airplane mode key
all do basically nothing. They don't send proper keysyms to Xorg at
all.
This is a known problem and it's related to the fact that the
laptop has light sensors to adjust the brightness
automatically. Somehow some of those keys (e.g. the brightness
controls) are supposed to show up as a different input device, but
don't seem to work correctly. It seems like the solution is for the
Framework team to write a driver specifically for this, but so far no
progress since July 2022.
In the meantime, the fancy functionality can be supposedly disabled with:
echo 'blacklist hid_sensor_hub' sudo tee /etc/modprobe.d/framework-als-blacklist.conf
Kill switches
The Framework has two "kill switches": one for the camera and the
other for the microphone. The camera one actually disconnects the USB
device when turned off, and the mic one seems to cut the circuit. It
doesn't show up as muted, it just stops feeding the sound.
Both kill switches are around the main camera, on top of the monitor,
and quite discreet. Then turn "red" when enabled (i.e. "red" means
"turned off").
Monitor
The monitor looks pretty good to my untrained eyes. I have yet to do
photography work on it, but some photos I looked at look sharp and the
colors are bright and lively. The blacks are dark and the screen is
bright.
I have yet to use it in full sunlight.
The dimmed light is very dim, which I like.
Screen backlight
I bind brightness keys to xbacklight in i3, but out of the box I get
this error:
sep 29 22:09:14 angela i3[5661]: No outputs have backlight property
It just requires this blob in /etc/X11/xorg.conf.d/backlight.conf:
This way I can control the actual backlight power with the brightness
keys, and they do significantly reduce power usage.
Multiple monitor support
I have been able to hook up my two old monitors to the HDMI and
DisplayPort expansion cards on the laptop. The lid closes without
suspending the machine, and everything works great.
I actually run out of ports, even with a 4-port USB-A hub, which gives
me a total of 7 ports:
power (USB-C)
monitor 1 (DisplayPort)
monitor 2 (HDMI)
USB-A hub, which adds:
keyboard (USB-A)
mouse (USB-A)
Yubikey
external sound card
Now the latter, I might be able to get rid of if I switch to a
combo-jack headset, which I do have (and still need to test).
But still, this is a problem. I'll probably need a powered USB-C dock
and better monitors, possibly with some Thunderbolt chaining, to
save yet more ports.
But that means more money into this setup, argh. And figuring out my
monitor situation is the kind of thing I'm not that big
of a fan of. And neither is shopping for USB-C (or is it Thunderbolt?)
hubs.
My normal autorandr setup doesn't work: I have tried saving a
profile and it doesn't get autodetected, so I also first need to do:
autorandr -l framework-external-dual-lg-acer
The magic:
autorandr -l horizontal
... also works well.
The worst problem with those monitors right now is that they have a
radically smaller resolution than the main screen on the laptop, which
means I need to reset the font scaling to normal every time I switch
back and forth between those monitors and the laptop, which means I
actually need to do this:
Expansion ports
I ordered a total of 10 expansion ports.
I did manage to initialize the 1TB drive as an encrypted storage,
mostly to keep photos as this is something that takes a massive amount
of space (500GB and counting) and that I (unfortunately) don't work on
very often (but still carry around).
The expansion ports are fancy and nice, but not actually that
convenient. They're a bit hard to take out: you really need to crimp
your fingernails on there and pull hard to take them out. There's a
little button next to them to release, I think, but at first it feels
a little scary to pull those pucks out of there. You get used to it
though, and it's one of those things you can do without looking
eventually.
There's only four expansion ports. Once you have two monitors, the
drive, and power plugged in, bam, you're out of ports; there's nowhere
to plug my Yubikey. So if this is going to be my daily driver, with a
dual monitor setup, I will need a dock, which means more crap firmware
and uncertainty, which isn't great. There are actually plans to make a
dual-USB card, but that is blocked on designing an actual
board for this.
I can't wait to see more expansion ports produced. There's a ethernet
expansion card which quickly went out of stock basically the day
it was announced, but was eventually restocked.
I would like to see a proper SD-card reader. There's a MicroSD card
reader, but that obviously doesn't work for normal SD cards, which
would be more broadly compatible anyways (because you can have a
MicroSD to SD card adapter, but I have never heard of the
reverse). Someone actually found a SD card reader that fits and
then someone else managed to cram it in a 3D printed case, which
is kind of amazing.
Still, I really like that idea that I can carry all those little
adapters in a pouch when I travel and can basically do anything I
want. It does mean I need to shuffle through them to find the right
one which is a little annoying. I have an elastic band to keep them
lined up so that all the ports show the same side, to make it easier
to find the right one. But that quickly gets undone and instead I have
a pouch full of expansion cards.
Another awesome thing with the expansion cards is that they don't just
work on the laptop: anything that takes USB-C can take those cards,
which means you can use it to connect an SD card to your phone, for
backups, for example. Heck, you could even connect an external display
to your phone that way, assuming that's supported by your phone of
course (and it probably isn't).
The expansion ports do take up some power, even when idle. See the
power management section below, and particularly the power usage
tests for details.
USB-C charging
One thing that is really a game changer for me is USB-C charging. It's
hard to overstate how convenient this is. I often have a USB-C cable
lying around to charge my phone, and I can just grab that thing and
pop it in my laptop. And while it will obviously not charge as fast as
the provided charger, it will stop draining the battery at least.
(As I wrote this, I had the laptop plugged in the Samsung charger that
came with a phone, and it was telling me it would take 6 hours to
charge the remaining 15%. With the provided charger, that flew down to
15 minutes. Similarly, I can power the laptop from the power grommet
on my desk, reducing clutter as I have that single wire out there
instead of the bulky power adapter.)
I also really like the idea that I can charge my laptop with a power
bank or, heck, with my phone, if push comes to shove. (And
vice-versa!)
This is awesome. And it works from any of the expansion ports, of
course. There's a little led next to the expansion ports as well,
which indicate the charge status:
red/amber: charging
white: charged
off: unplugged
I couldn't find documentation about this, but the forum
answered.
This is something of a recurring theme with the Framework. While it
has a good knowledge base and repair/setup guides (and the
forum is awesome) but it doesn't have a good "owner manual" that
shows you the different parts of the laptop and what they do. Again,
something the MNT reform did well.
Another thing that people are asking about is an external sleep
indicator: because the power LED is on the main keyboard assembly,
you don't actually see whether the device is active or not when the
lid is closed.
Finally, I wondered what happens when you plug in multiple power
sources and it turns out the charge controller is actually pretty
smart: it will pick the best power source and use it. The only
downside is it can't use multiple power sources, but that seems like
a bit much to ask.
Multimedia and other devices
Those things also work:
webcam: splendid, best webcam I've ever had (but my standards are
really low)
onboard mic: works well, good gain (maybe a bit much)
onboard speakers: sound okay, a little metal-ish, loud enough to be
annoying, see this thread for benchmarks, apparently pretty
good speakers
Combo jack mic tests
The Framework laptop ships with a combo jack on the left side, which
allows you to plug in a CTIA (source) headset. In human
terms, it's a device that has both a stereo output and a mono input,
typically a headset or ear buds with a microphone somewhere.
It works, which is better than the Purism (which only had audio
out), but is on par for the course for that kind of onboard
hardware. Because of electrical interference, such sound cards very
often get lots of noise from the board.
With a Jabra Evolve 40, the built-in USB sound card generates
basically zero noise on silence (invisible down to -60dB in Audacity)
while plugging it in directly generates a solid -30dB hiss. There is
a noise-reduction system in that sound card, but the difference is
still quite striking.
On a comparable setup (curie, a 2017 Intel NUC), there is
also a his with the Jabra headset, but it's quieter, more in the order
of -40/-50 dB, a noticeable difference. Interestingly, testing with my
Mee Audio Pro M6 earbuds leads to a little more hiss on curie, more on
the -35/-40 dB range, close to the Framework.
Also note that another sound card, the Antlion USB adapter that comes
with the ModMic 4, also gives me pretty close to silence on a quiet
recording, picking up less than -50dB of background noise. It's
actually probably picking up the fans in the office, which do make
audible noises.
In other words, the hiss of the sound card built in the Framework
laptop is so loud that it makes more noise than the quiet fans in the
office. Or, another way to put it is that two USB sound cards (the
Jabra and the Antlion) are able to pick up ambient noise in my office
but not the Framework laptop.
See also my audio page.
Performance tests
Compiling Linux 5.19.11
On a single core, compiling the Debian version of the Linux kernel
takes around 100 minutes:
I had to plug the normal power supply after a few minutes because
battery would actually run out using my desk's power grommet (34
watts).
During compilation, fans were spinning really hard, quite noisy, but
not painfully so.
The laptop was sucking 55 watts of power, steadily:
Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Average 87.9 0.0 10.7 1.4 0.1 17.8 6583.6 5054.3 233.0 223.9 233.1 55.96
GeoMean 87.9 0.0 10.6 1.2 0.0 17.6 6427.8 5048.1 227.6 218.7 227.7 55.96
StdDev 1.4 0.0 1.2 0.6 0.2 3.0 1436.8 255.5 50.0 47.5 49.7 0.20
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Minimum 85.0 0.0 7.8 0.5 0.0 13.0 3594.0 4638.0 117.0 111.0 120.0 55.52
Maximum 90.8 0.0 12.9 3.5 0.8 38.0 10174.0 5901.0 374.0 362.0 375.0 56.41
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
CPU: 55.96 Watts on average with standard deviation 0.20
Note: power read from RAPL domains: package-0, uncore, package-0, core, psys.
These readings do not cover all the hardware in this device.
memtest86+
I ran Memtest86+ v6.00b3. It shows something like this:
Software setup
Once I had everything in the hardware setup, I figured, voil , I'm
done, I'm just going to boot this beautiful machine and I can get back
to work.
I don't understand why I am so na ve some times. It's mind boggling.
Obviously, it didn't happen that way at all, and I spent the best of
the three following days tinkering with the laptop.
Secure boot and EFI
First, I couldn't boot off of the NVMe drive I transferred from the
previous laptop (the Purism) and the
BIOS was not very helpful: it was just complaining about not finding
any boot device, without dropping me in the real BIOS.
At first, I thought it was a problem with my NVMe drive, because it's
not listed in the compatible SSD drives from upstream. But I
figured out how to enter BIOS (press F2 manically, of
course), which showed the NVMe drive was actually detected. It just
didn't boot, because it was an old (2010!!) Debian install without
EFI.
So from there, I disabled secure boot, and booted a grml image to
try to recover. And by "boot" I mean, I managed to get to the grml
boot loader which promptly failed to load its own root file system
somehow. I still have to investigate exactly what happened there, but
it failed some time after the initrd load with:
Unable to find medium containing a live file system
This, it turns out, was fixed in Debian lately, so a daily GRML
build will not have this problems. The upcoming 2022 release
(likely 2022.10 or 2022.11) will also get the fix.
I did manage to boot the development version of the Debian
installer which was a surprisingly good experience: it mounted the
encrypted drives and did everything pretty smoothly. It even offered
me to reinstall the boot loader, but that ultimately (and correctly, as
it turns out) failed because I didn't have a /boot/efi partition.
At this point, I realized there was no easy way out of this, and I
just proceeded to completely reinstall Debian. I had a spare NVMe
drive lying around (backups FTW!) so I just swapped that in, rebooted
in the Debian installer, and did a clean install. I wanted to switch
to bookworm anyways, so I guess that's done too.
Storage limitations
Another thing that happened during setup is that I tried to copy over
the internal 2.5" SSD drive from the Purism to the Framework 1TB
expansion card. There's no 2.5" slot in the new laptop, so that's
pretty much the only option for storage expansion.
I was tired and did something wrong. I ended up wiping the partition
table on the original 2.5" drive.
Oops.
It might be recoverable, but just restoring the partition table
didn't work either, so I'm not sure how I recover the data
there. Normally, everything on my laptops and workstations is designed
to be disposable, so that wasn't that big of a problem. I did manage
to recover most of the data thanks to git-annexreinit, but
that was a little hairy.
Bootstrapping Puppet
Once I had some networking, I had to install all the packages I
needed. The time I spent setting up my workstations with Puppet has
finally paid off. What I actually did was to restore two critical
directories:
/etc/ssh
/var/lib/puppet
So that I would keep the previous machine's identity. That way I could
contact the Puppet server and install whatever was missing. I used my
Puppet optimization
trick to do a batch
install and then I had a good base setup, although not exactly as it
was before. 1700 packages were installed manually on angela before
the reinstall, and not in Puppet.
I did not inspect each one individually, but I did go through /etc
and copied over more SSH keys, for backups and SMTP over SSH.
LVFS support
It looks like there's support for the (de-facto) standard LVFS
firmware update system. At least I was able to update the UEFI
firmware with a simple:
Those instructions come from the beta forum post. I performed the
BIOS update on 2023-01-16T16:00-0500.
Resolution tweaks
The Framework laptop resolution (2256px X 1504px) is big enough to
give you a pretty small font size, so welcome to the marvelous world
of "scaling".
The Debian wiki page has a few tricks for this.
Console
This will make the console and grub fonts more readable:
Xorg
Adding this to your .Xresources will make everything look much bigger:
! 1.5*96
Xft.dpi: 144
Apparently, some of this can also help:
! These might also be useful depending on your monitor and personal preference:
Xft.autohint: 0
Xft.lcdfilter: lcddefault
Xft.hintstyle: hintfull
Xft.hinting: 1
Xft.antialias: 1
Xft.rgba: rgb
It my experience it also makes things look a little fuzzier, which is
frustrating because you have this awesome monitor but everything looks
out of focus. Just bumping Xft.dpi by a 1.5 factor looks good to me.
The Debian Wiki has a page on HiDPI, but it's not as good as the
Arch Wiki, where the above blurb comes from. I am not using the
latter because I suspect it's causing some of the "fuzziness".
TODO: find the equivalent of this GNOME hack in i3? (gsettings set
org.gnome.mutter experimental-features
"['scale-monitor-framebuffer']"), taken from this Framework
guide
Issues
BIOS configuration
The Framework BIOS has some minor issues. One issue I personally
encountered is that I had disabled Quick boot and Quiet boot in
the BIOS to diagnose the above boot issues. This, in turn, triggers a
bug where the BIOS boot manager (F12) would just hang
completely. It would also fail to boot from an external USB drive.
The current fix (as of BIOS 3.03) is to re-enable both Quick
boot and Quiet boot. Presumably this is something that will get
fixed in a future BIOS update.
Note that the following keybindings are active in the BIOS POST
check:
Key
Meaning
F2
Enter BIOS setup menu
F12
Enter BIOS boot manager
Delete
Enter BIOS setup menu
WiFi compatibility issues
I couldn't make WiFi work at first. Obviously, the default Debian
installer doesn't ship with proprietary firmware (although that might
change soon) so the WiFi card didn't work out of the box. But even
after copying the firmware through a USB stick, I couldn't quite
manage to find the right combination of ip/iw/wpa-supplicant
(yes, after repeatedly copying a bunch more packages over to get
those bootstrapped). (Next time I should probably try something like
this post.)
Thankfully, I had a little USB-C dongle with a RJ-45 jack lying
around. That also required a firmware blob, but it was a single
package to copy over, and with that loaded, I had network.
Eventually, I did managed to make WiFi work; the problem was more on
the side of "I forgot how to configure a WPA network by hand from the
commandline" than anything else. NetworkManager worked fine and got
WiFi working correctly.
Note that this is with Debian bookworm, which has the 5.19 Linux
kernel, and with the firmware-nonfree (firmware-iwlwifi,
specifically) package.
Battery life
I was having between about 7 hours of battery on the Purism Librem
13v4, and that's after a year or two of battery life. Now, I still
have about 7 hours of battery life, which is nicer than my old
ThinkPad X220 (20 minutes!) but really, it's not that good for a new
generation laptop. The 12th generation Intel chipset probably improved
things compared to the previous one Framework laptop, but I don't have
a 11th gen Framework to compare with).
(Note that those are estimates from my status bar, not wall clock
measurements. They should still be comparable between the Purism and
Framework, that said.)
The battery life doesn't seem up to, say, Dell XPS 13, ThinkPad X1, and
of course not the Apple M1, where I would expect 10+ hours of battery
life out of the box.
That said, I do get those kind estimates when the machine is fully
charged and idle. In fact, when everything is quiet and nothing is
plugged in, I get dozens of hours of battery life estimated (I've
seen 25h!). So power usage fluctuates quite a bit depending on usage,
which I guess is expected.
Concretely, so far, light web browsing, reading emails and writing
notes in Emacs (e.g. this file) takes about 8W of power:
Expansion cards matter a lot in the battery life (see below for a
thorough discussion), my normal setup is 2xUSB-C and 1xUSB-A (yes,
with an empty slot, and yes, to save power).
Interestingly, playing a video in a (720p) window in a window takes up
more power (10.5W) than in full screen (9.5W) but I blame that on my
desktop setup (i3 + compton)... Not sure if mpv hits the
VA-API, maybe not in windowed mode. Similar results with 1080p,
interestingly, except the window struggles to keep up altogether. Full
screen playback takes a relatively comfortable 9.5W, which means a
solid 5h+ of playback, which is fine by me.
Fooling around the web, small edits, youtube-dl, and I'm at around 80%
battery after about an hour, with an estimated 5h left, which is a
little disappointing. I had a 7h remaining estimate before I started
goofing around Discourse, so I suspect the website is a pretty
big battery drain, actually. I see about 10-12 W, while I was probably at
half that (6-8W) just playing music with mpv in the background...
In other words, it looks like editing posts in Discourse with Firefox
takes a solid 4-6W of power. Amazing and gross.
(When writing about abusive power usage generates more power usage, is
that an heisenbug? Or schr dinbug?)
Power management
Compared to the Purism Librem 13v4, the ongoing power usage seems to
be slightly better. An anecdotal metric is that the Purism would take
800mA idle, while the more powerful Framework manages a little over
500mA as I'm typing this, fluctuating between 450 and 600mA. That is
without any active expansion card, except the storage. Those numbers
come from the output of tlp-stat -b and, unfortunately, the "ampere"
unit makes it quite hard to compare those, because voltage is not
necessarily the same between the two platforms.
TODO: i915 driver has a lot of parameters, including some about
power saving, see, again, the arch wiki, and particularly
enable_fbc=1
TL:DR; power management on the laptop is an issue, but there's various
tweaks you can make to improve it. Try:
powertop --auto-tune
apt install tlp && systemctl enable tlp
nvme.noacpi=1 mem_sleep_default=deep on the kernel command line
may help with standby power usage
keep only USB-C expansion cards plugged in, all others suck power
even when idle
consider upgrading the BIOS to latest beta (3.06 at the time of
writing), unverified power savings
latest Linux kernels (6.2) promise power savings as well
(unverified)
Update: also try to follow the official optimization guide. It
was made for Ubuntu but will probably also work for your distribution
of choice with a few tweaks. They recommend using tlpui but it's
not packaged in Debian. There is, however, a Flatpak
release. In my case, it resulted in the following diff to
tlp.conf: tlp.patch.
Background on CPU architecture
There were power problems in the 11th gen Framework laptop, according
to this report from Linux After Dark, so the issues with power
management on the Framework are not new.
The 12th generation Intel CPU (AKA "Alder Lake") is a big-little
architecture with "power-saving" and "performance" cores. There
used to be performance problems introduced by the scheduler in Linux
5.16 but those were eventually fixed in 5.18, which uses
Intel's hardware as an "intelligent, low-latency hardware-assisted
scheduler". According to Phoronix, the 5.19 release improved the
power saving, at the cost of some penalty cost. There were also patch
series to make the scheduler configurable, but it doesn't look
those have been merged as of 5.19. There was also a session about this
at the 2022 Linux Plumbers, but they stopped short of
talking more about the specific problems Linux is facing in Alder
lake:
Specifically, the kernel's energy-aware scheduling heuristics don't
work well on those CPUs. A number of features present there
complicate the energy picture; these include SMT, Intel's "turbo
boost" mode, and the CPU's internal power-management mechanisms. For
many workloads, running on an ostensibly more power-hungry Pcore can
be more efficient than using an Ecore. Time for discussion of the
problem was lacking, though, and the session came to a close.
All this to say that the 12gen Intel line shipped with this Framework
series should have better power management thanks to its
power-saving cores. And Linux has had the scheduler changes to make
use of this (but maybe is still having trouble). In any case, this
might not be the source of power management problems on my laptop,
quite the opposite.
Also note that the firmware updates for various chipsets are
supposed to improve things eventually.
On the other hand, The Verge simply declared the whole P-series
a mistake...
Attempts at improving power usage
I did try to follow some of the tips in this forum post. The
tricks powertop --auto-tune and tlp's
PCIE_ASPM_ON_BAT=powersupersave basically did nothing: I was stuck
at 10W power usage in powertop (600+mA in tlp-stat).
Apparently, I should be able to reach the C8 CPU power state (or
even C9, C10) in powertop, but I seem to be stock at
C7. (Although I'm not sure how to read that tab in powertop: in the
Core(HW) column there's only C3/C6/C7 states, and most cores are 85%
in C7 or maybe C6. But the next column over does show many CPUs in
C10 states...
As it turns out, the graphics card actually takes up a good chunk of
power unless proper power management is enabled (see below). After
tweaking this, I did manage to get down to around 7W power usage in
powertop.
Expansion cards actually do take up power, and so does the screen,
obviously. The fully-lit screen takes a solid 2-3W of power compared
to the fully dimmed screen. When removing all expansion cards and
making the laptop idle, I can spin it down to 4 watts power usage at
the moment, and an amazing 2 watts when the screen turned off.
Caveats
Abusive (10W+) power usage that I initially found could be a problem
with my desktop configuration: I have this silly status bar that
updates every second and probably causes redraws... The CPU certainly
doesn't seem to spin down below 1GHz. Also note that this is with an
actual desktop running with everything: it could very well be that
some things (I'm looking at you Signal Desktop) take up unreasonable
amount of power on their own (hello, 1W/electron, sheesh). Syncthing
and containerd (Docker!) also seem to take a good 500mW just sitting
there.
Beyond my desktop configuration, this could, of course, be a
Debian-specific problem; your favorite distribution might be better at
power management.
Idle power usage tests
Some expansion cards waste energy, even when unused. Here is a summary
of the findings from the powerstat page. I also include other
devices tested in this page for completeness:
Device
Minimum
Average
Max
Stdev
Note
Screen, 100%
2.4W
2.6W
2.8W
N/A
Screen, 1%
30mW
140mW
250mW
N/A
Backlight 1
290mW
?
?
?
fairly small, all things considered
Backlight 2
890mW
1.2W
3W?
460mW?
geometric progression
Backlight 3
1.69W
1.5W
1.8W?
390mW?
significant power use
Radios
100mW
250mW
N/A
N/A
USB-C
N/A
N/A
N/A
N/A
negligible power drain
USB-A
10mW
10mW
?
10mW
almost negligible
DisplayPort
300mW
390mW
600mW
N/A
not passive
HDMI
380mW
440mW
1W?
20mW
not passive
1TB SSD
1.65W
1.79W
2W
12mW
significant, probably higher when busy
MicroSD
1.6W
3W
6W
1.93W
highest power usage, possibly even higher when busy
Ethernet
1.69W
1.64W
1.76W
N/A
comparable to the SSD card
So it looks like all expansion cards but the USB-C ones are active,
i.e. they draw power with idle. The USB-A cards are the least concern,
sucking out 10mW, pretty much within the margin of error. But both the
DisplayPort and HDMI do take a few hundred miliwatts. It looks like
USB-A connectors have this fundamental flaw that they necessarily draw
some powers because they lack the power negotiation features of
USB-C. At least according to this post:
It seems the USB A must have power going to it all the time, that
the old USB 2 and 3 protocols, the USB C only provides power when
there is a connection. Old versus new.
Apparently, this is a problem specific to the USB-C to USB-A
adapter that ships with the Framework. Some people have actually
changed their orders to all USB-C because of this problem, but I'm
not sure the problem is as serious as claimed in the forums. I
couldn't reproduce the "one watt" power drains suggested elsewhere,
at least not repeatedly. (A previous version of this post did show
such a power drain, but it was in a less controlled test
environment than the series of more rigorous tests above.)
The worst offenders are the storage cards: the SSD drive takes at
least one watt of power and the MicroSD card seems to want to take all
the way up to 6 watts of power, both just sitting there doing
nothing. This confirms claims of 1.4W for the SSD (but not
5W) power usage found elsewhere. The former post has
instructions on how to disable the card in software. The MicroSD card
has been reported as using 2 watts, but I've seen it as high as 6
watts, which is pretty damning.
The Framework team has a beta update for the DisplayPort adapter
but currently only for Windows (LVFS technically possible, "under
investigation"). A USB-A firmware update is alsounder
investigation. It is therefore likely at least some of those power
management issues will eventually be fixed.
Note that the upcoming Ethernet card has a reported 2-8W power usage,
depending on traffic. I did my own power usage tests in
powerstat-wayland and they seem lower than 2W.
The upcoming 6.2 Linux kernel might also improve battery usage when
idle, see this Phoronix article for details, likely in early
2023.
Idle power usage tests under Wayland
Update: I redid those tests under Wayland, see powerstat-wayland
for details. The TL;DR: is that power consumption is either smaller or
similar.
Idle power usage tests, 3.06 beta BIOS
I redid the idle tests after the 3.06 beta BIOS update and ended
up with this results:
Device
Minimum
Average
Max
Stdev
Note
Baseline
1.96W
2.01W
2.11W
30mW
1 USB-C, screen off, backlight off, no radios
2 USB-C
1.95W
2.16W
3.69W
430mW
USB-C confirmed as mostly passive...
3 USB-C
1.95W
2.16W
3.69W
430mW
... although with extra stdev
1TB SSD
3.72W
3.85W
4.62W
200mW
unchanged from before upgrade
1 USB-A
1.97W
2.18W
4.02W
530mW
unchanged
2 USB-A
1.97W
2.00W
2.08W
30mW
unchanged
3 USB-A
1.94W
1.99W
2.03W
20mW
unchanged
MicroSD w/o card
3.54W
3.58W
3.71W
40mW
significant improvement! 2-3W power saving!
MicroSD w/ card
3.53W
3.72W
5.23W
370mW
new measurement! increased deviation
DisplayPort
2.28W
2.31W
2.37W
20mW
unchanged
1 HDMI
2.43W
2.69W
4.53W
460mW
unchanged
2 HDMI
2.53W
2.59W
2.67W
30mW
unchanged
External USB
3.85W
3.89W
3.94W
30mW
new result
Ethernet
3.60W
3.70W
4.91W
230mW
unchanged
Note that the table summary is different than the previous table: here
we show the absolute numbers while the previous table was doing a
confusing attempt at showing relative (to the baseline) numbers.
Conclusion: the 3.06 BIOS update did not significantly change idle
power usage stats except for the MicroSD card which has
significantly improved.
The new "external USB" test is also interesting: it shows how the
provided 1TB SSD card performs (admirably) compared to existing
devices. The other new result is the MicroSD card with a card which,
interestingly, uses less power than the 1TB SSD drive.
That's 8mAh per 10 minutes (and 2 seconds), or 48mA, or, with this
battery, about 127 hours or roughly 5 days of standby. Not bad!
In comparison, here is my really old x220, before:
sep 29 22:13:54 emma systemd-sleep[176315]: /sys/class/power_supply/BAT0/energy_now = 5070 [mWh]
... after:
sep 29 22:23:54 emma systemd-sleep[176486]: /sys/class/power_supply/BAT0/energy_now = 4980 [mWh]
... which is 90 mwH in 10 minutes, or a whopping 540mA, which was
possibly okay when this battery was new (62000 mAh, so about 100
hours, or about 5 days), but this battery is almost dead and has
only 5210 mAh when full, so only 10 hours standby.
And here is the Framework performing a similar test, before:
... which is 49mAh in a little over 10 minutes (and 4 seconds), or
292mA, much more than the Purism, but half of the X220. At this rate,
the battery would last on standby only 12 hours!! That is pretty
bad.
Note that this was done with the following expansion cards:
2 USB-C
1 1TB SSD drive
1 USB-A with a hub connected to it, with keyboard and LAN
Preliminary tests without the hub (over one minute) show that it
doesn't significantly affect this power consumption (300mA).
This guide also suggests booting with nvme.noacpi=1 but this
still gives me about 5mAh/min (or 300mA).
Adding mem_sleep_default=deep to the kernel command line does make a
difference. Before:
... which is 2mAh in 74 seconds, which is 97mA, brings us to a more
reasonable 36 hours, or a day and a half. It's still above the x220
power usage, and more than an order of magnitude more than the Purism
laptop. It's also far from the 0.4% promised by upstream, which
would be 14mA for the 3500mAh battery.
It should also be noted that this "deep" sleep mode is a little more
disruptive than regular sleep. As you can see by the timing, it took
more than 10 seconds for the laptop to resume, which feels a little
alarming as your banging the keyboard to bring it back to life.
You can confirm the current sleep mode with:
# cat /sys/power/mem_sleep
s2idle [deep]
In the above, deep is selected. You can change it on the fly with:
... better! 6 mAh in about 6 minutes, works out to 63.5mA, so more
than two days standby.
A longer test:
oct 01 09:22:56 angela systemd-sleep[62978]: /sys/class/power_supply/BAT1/charge_now = 3327 [mAh]
oct 01 12:47:35 angela systemd-sleep[63219]: /sys/class/power_supply/BAT1/charge_now = 3147 [mAh]
That's 180mAh in about 3.5h, 52mA! Now at 66h, or almost 3 days.
I wasn't sure why I was seeing such fluctuations in those tests, but
as it turns out, expansion card power tests show that they do
significantly affect power usage, especially the SSD drive, which can
take up to two full watts of power even when idle. I didn't control
for expansion cards in the above tests running them with whatever
card I had plugged in without paying attention so it's likely the
cause of the high power usage and fluctuations.
It might be possible to work around this problem by disabling USB
devices before suspend. TODO. See also this post.
In the meantime, I have been able to get much better suspend
performance by unplugging all modules. Then I get this result:
oct 04 11:15:38 angela systemd-sleep[257571]: /sys/class/power_supply/BAT1/charge_now = 3203 [mAh]
oct 04 15:09:32 angela systemd-sleep[257866]: /sys/class/power_supply/BAT1/charge_now = 3145 [mAh]
Which is 14.8mA! Almost exactly the number promised by Framework! With
a full battery, that means a 10 days suspend time. This is actually
pretty good, and far beyond what I was expecting when starting down
this journey.
So, once the expansion cards are unplugged, suspend power usage is
actually quite reasonable. More detailed standby tests are available
in the standby-tests page, with a summary below.
There is also some hope that the Chromebook edition
specifically designed with a specification of 14 days standby
time could bring some firmware improvements back down to the
normal line. Some of those issues were reported upstream in April
2022, but there doesn't seem to have been any progress there
since.
TODO: one final solution here is suspend-then-hibernate, which
Windows uses for this
TODO: consider implementing the S0ix sleep states , see also troubleshooting
TODO: consider https://github.com/intel/pm-graph
Standby expansion cards test results
This table is a summary of the more extensive standby-tests I have performed:
Device
Wattage
Amperage
Days
Note
baseline
0.25W
16mA
9
sleep=deep nvme.noacpi=1
s2idle
0.29W
18.9mA
~7
sleep=s2idle nvme.noacpi=1
normal nvme
0.31W
20mA
~7
sleep=s2idle without nvme.noacpi=1
1 USB-C
0.23W
15mA
~10
2 USB-C
0.23W
14.9mA
same as above
1 USB-A
0.75W
48.7mA
3
+500mW (!!) for the first USB-A card!
2 USB-A
1.11W
72mA
2
+360mW
3 USB-A
1.48W
96mA
<2
+370mW
1TB SSD
0.49W
32mA
<5
+260mW
MicroSD
0.52W
34mA
~4
+290mW
DisplayPort
0.85W
55mA
<3
+620mW (!!)
1 HDMI
0.58W
38mA
~4
+250mW
2 HDMI
0.65W
42mA
<4
+70mW
Conclusions:
USB-C cards take no extra power on suspend, possibly less
than empty slots, more testing required
USB-A cards take a lot more power on suspend
(300-500mW) than on regular idle (~10mW, almost negligible)
1TB SSD and MicroSD cards seem to take a reasonable
amount of power (260-290mW), compared to their runtime
equivalents (1-6W!)
DisplayPort takes a surprising lot of power (620mW), almost
double its average runtime usage (390mW)
HDMI cards take, surprisingly, less power (250mW) in
standby than the DP card (620mW)
and oddly, a second card adds less power usage (70mW?!) than the
first, maybe a circuit is used by both?
Standby expansion cards test results, 3.06 beta BIOS
Framework recently (2022-11-07) announced that they will publish
a firmware upgrade to address some of the USB-C issues, including
power management. This could positively affect the above result,
improving both standby and runtime power usage.
The update came out in December 2022 and I redid my analysis with
the following results:
Device
Wattage
Amperage
Days
Note
baseline
0.25W
16mA
9
no cards, same as before upgrade
1 USB-C
0.25W
16mA
9
same as before
2 USB-C
0.25W
16mA
9
same
1 USB-A
0.80W
62mA
3
+550mW!! worse than before
2 USB-A
1.12W
73mA
<2
+320mW, on top of the above, bad!
Ethernet
0.62W
40mA
3-4
new result, decent
1TB SSD
0.52W
34mA
4
a bit worse than before (+2mA)
MicroSD
0.51W
22mA
4
same
DisplayPort
0.52W
34mA
4+
upgrade improved by 300mW
1 HDMI
?
38mA
?
same
2 HDMI
?
45mA
?
a bit worse than before (+3mA)
Normal
1.08W
70mA
~2
Ethernet, 2 USB-C, USB-A
Full results in standby-tests-306. The big takeaway for me is that
the update did not improve power usage on the USB-A ports which is a
big problem for my use case. There is a notable improvement on the
DisplayPort power consumption which brings it more in line with the
HDMI connector, but it still doesn't properly turn off on suspend
either.
Even worse, the USB-A ports now sometimes fails to resume after
suspend, which is pretty annoying. This is a known problem
that will hopefully get fixed in the final release.
I looked at building this myself but failed to run it. I opened a
RFP in Debian so that we can ship this in Debian, and also documented
my work there.
Note that there is now a counter that tracks charge/discharge
cycles. It's visible in tlp-stat -b, which is a nice
improvement:
Ethernet expansion card
The Framework ethernet expansion card is a fancy little doodle:
"2.5Gbit/s and 10/100/1000Mbit/s Ethernet", the "clear housing lets
you peek at the RTL8156 controller that powers it". Which is another
way to say "we didn't completely finish prod on this one, so it kind
of looks like we 3D-printed this in the shop"....
The card is a little bulky, but I guess that's inevitable considering
the RJ-45 form factor when compared to the thin Framework laptop.
I have had a serious issue when trying it at first: the link LEDs
just wouldn't come up. I made a full bug report in the forum and
with upstream support, but eventually figured it out on my own. It's
(of course) a power saving issue: if you reboot the machine, the links
come up when the laptop is running the BIOS POST check and even when
the Linux kernel boots.
I first thought that the problem is likely related to the
powertop service which I run at boot time to tweak some power saving
settings.
It seems like this:
By default, USB power saving is active in the kernel, but not
force-enabled for incompatible drivers. That is, devices that
support suspension will suspend, drivers that do not, will not.
So the fix is actually to uninstall tlp or disable that setting by
adding this to /etc/tlp.conf:
USB_AUTOSUSPEND=0
... but that disables auto-suspend on all USB devices, which may
hurt other power usage performance. I have found that a a
combination of:
USB_AUTOSUSPEND=1
USB_DENYLIST="0bda:8156"
and this on the kernel commandline:
usbcore.quirks=0bda:8156:k
... actually does work correctly. I now have this in my
/etc/default/grub.d/framework-tweaks.cfg file:
# net.ifnames=0: normal interface names ffs (e.g. eth0, wlan0, not wlp166
s0)
# nvme.noacpi=1: reduce SSD disk power usage (not working)
# mem_sleep_default=deep: reduce power usage during sleep (not working)
# usbcore.quirk is a workaround for the ethernet card suspend bug: https:
//guides.frame.work/Guide/Fedora+37+Installation+on+the+Framework+Laptop/
108?lang=en
GRUB_CMDLINE_LINUX="net.ifnames=0 nvme.noacpi=1 mem_sleep_default=deep usbcore.quirks=0bda:8156:k"
# fix the resolution in grub for fonts to not be tiny
GRUB_GFXMODE=1024x768
Other than that, I haven't been able to max out the card because I
don't have other 2.5Gbit/s equipment at home, which is strangely
satisfying. But running against my Turris Omnia
router, I could pretty much max a gigabit fairly easily:
The card doesn't require any proprietary firmware blobs which is
surprising. Other than the power saving issues, it just works.
In my power tests (see powerstat-wayland), the Ethernet card seems
to use about 1.6W of power idle, without link, in the above "quirky"
configuration where the card is functional but without autosuspend.
Proprietary firmware blobs
The framework does need proprietary firmware to operate. Specifically:
the WiFi network card shipped with the DIY kit is a AX210 card that
requires a 5.19 kernel or later, and the firmware-iwlwifi non-free firmware package
the Bluetooth adapter also loads the firmware-iwlwifi
package (untested)
the graphics work out of the box without firmware, but certain
power management features come only with special proprietary
firmware, normally shipped in the firmware-misc-nonfree
but currently missing from the package
Note that, at the time of writing, the latest i915 firmware from
linux-firmware has a serious bug where loading all the
accessible firmware results in noticeable I estimate 200-500ms lag
between the keyboard (not the mouse!) and the display. Symptoms also
include tearing and shearing of windows, it's pretty nasty.
One workaround is to delete the two affected firmware files:
cd /lib/firmware && rm adlp_guc_70.1.1.bin adlp_guc_69.0.3.bin
update-initramfs -u
You will get the following warning during build, which is good as
it means the problematic firmware is disabled:
W: Possible missing firmware /lib/firmware/i915/adlp_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/adlp_guc_70.1.1.bin for module i915
But then it also means that critical firmware isn't loaded, which
means, among other things, a higher battery drain. I was able to move
from 8.5-10W down to the 7W range after making the firmware work
properly. This is also after turning the backlight all the way down,
as that takes a solid 2-3W in full blast.
The proper fix is to use some compositing manager. I ended up using
compton with the following systemd unit:
compton is orphaned however, so you might be tempted to use
picom instead, but in my experience the latter uses much
more power (1-2W extra, similar experience). I also tried
compiz but it would just crash with:
anarcat@angela:~$ compiz --replace
compiz (core) - Warn: No XI2 extension
compiz (core) - Error: Another composite manager is already running on screen: 0
compiz (core) - Fatal: No manageable screens found on display :0
When running from the base session, I would get this instead:
Also note that the iwlwifi firmware also looks incomplete. Even with
the package installed, I get those errors in dmesg:
[ 19.534429] Intel(R) Wireless WiFi driver for Linux
[ 19.534691] iwlwifi 0000:a6:00.0: enabling device (0000 -> 0002)
[ 19.541867] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-72.ucode (-2)
[ 19.541881] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-72.ucode (-2)
[ 19.541882] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-72.ucode failed with error -2
[ 19.541890] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-71.ucode (-2)
[ 19.541895] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-71.ucode (-2)
[ 19.541896] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-71.ucode failed with error -2
[ 19.541903] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-70.ucode (-2)
[ 19.541907] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-70.ucode (-2)
[ 19.541908] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-70.ucode failed with error -2
[ 19.541913] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-69.ucode (-2)
[ 19.541916] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-69.ucode (-2)
[ 19.541917] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-69.ucode failed with error -2
[ 19.541922] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-68.ucode (-2)
[ 19.541926] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-68.ucode (-2)
[ 19.541927] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-68.ucode failed with error -2
[ 19.541933] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-67.ucode (-2)
[ 19.541937] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-67.ucode (-2)
[ 19.541937] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-67.ucode failed with error -2
[ 19.544244] iwlwifi 0000:a6:00.0: firmware: direct-loading firmware iwlwifi-ty-a0-gf-a0-66.ucode
[ 19.544257] iwlwifi 0000:a6:00.0: api flags index 2 larger than supported by driver
[ 19.544270] iwlwifi 0000:a6:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.63.2.1
[ 19.544523] iwlwifi 0000:a6:00.0: firmware: failed to load iwl-debug-yoyo.bin (-2)
[ 19.544528] iwlwifi 0000:a6:00.0: firmware: failed to load iwl-debug-yoyo.bin (-2)
[ 19.544530] iwlwifi 0000:a6:00.0: loaded firmware version 66.55c64978.0 ty-a0-gf-a0-66.ucode op_mode iwlmvm
Some of those are available in the latest upstream firmware package
(iwlwifi-ty-a0-gf-a0-71.ucode, -68, and -67), but not all
(e.g. iwlwifi-ty-a0-gf-a0-72.ucode is missing) . It's unclear what
those do or don't, as the WiFi seems to work well without them.
I still copied them in from the latest linux-firmware package in the
hope they would help with power management, but I did not notice a
change after loading them.
There are also multiple knobs on the iwlwifi and iwlmvm
drivers. The latter has a power_schmeme setting which defaults to
2 (balanced), setting it to 3 (low power) could improve
battery usage as well, in theory. The iwlwifi driver also has
power_save (defaults to disabled) and power_level (1-5, defaults
to 1) settings. See also the output of modinfo iwlwifi and
modinfo iwlmvm for other driver options.
Graphics acceleration
After loading the latest upstream firmware and setting up a
compositing manager (compton, above), I tested the classic
glxgears.
Running in a window gives me odd results, as the gears basically grind
to a halt:
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
137 frames in 5.1 seconds = 26.984 FPS
27 frames in 5.4 seconds = 5.022 FPS
Ouch. 5FPS!
But interestingly, once the window is in full screen, it does hit the
monitor refresh rate:
300 frames in 5.0 seconds = 60.000 FPS
I'm not really a gamer and I'm not normally using any of that fancy
graphics acceleration stuff (except maybe my browser does?).
I installed intel-gpu-tools for the intel_gpu_top
command to confirm the GPU was engaged when doing those simulations. A
nice find. Other useful diagnostic tools include glxgears and
glxinfo (in mesa-utils) and (vainfo in vainfo).
Following to this post, I also made sure to have those settings
in my about:config in Firefox, or, in user.js:
user_pref("media.ffmpeg.vaapi.enabled", true);
Note that the guide suggests many other settings to tweak, but those
might actually be overkill, see this comment and its parents. I
did try forcing hardware acceleration by setting gfx.webrender.all
to true, but everything became choppy and weird.
The guide also mentions installing the intel-media-driver package,
but I could not find that in Debian.
The Arch wiki has, as usual, an excellent reference on hardware
acceleration in Firefox.
Chromium / Signal desktop bugs
It looks like both Chromium and Signal Desktop misbehave with my
compositor setup (compton + i3). The fix is to add a persistent
flag to Chromium. In Arch, it's conveniently in
~/.config/chromium-flags.conf but that doesn't actually work in
Debian. I had to put the flag in
/etc/chromium.d/disable-compositing, like this:
It's possible another one of the hundreds of flags might fix this
issue better, but I don't really have time to go through this entire,
incomplete, and unofficial list (!?!).
Signal Desktop is a similar problem, and doesn't reuse those flags
(because of course it doesn't). Instead I had to rewrite the wrapper
script in /usr/local/bin/signal-desktop to use this instead:
exec /usr/bin/flatpak run --branch=stable --arch=x86_64 org.signal.Signal --disable-gpu-compositing "$@"
This was mostly done in this Puppet commit.
I haven't figured out the root of this problem. I did try using
picom and xcompmgr; they both suffer from the same issue. Another
Debian testing user on Wayland told me they haven't seen this problem,
so hopefully this can be fixed by switching to
wayland.
Graphics card hangs
I believe I might have this bug which results in a total
graphical hang for 15-30 seconds. It's fairly rare so it's not too
disruptive, but when it does happen, it's pretty alarming.
The comments on that bug report are encouraging though: it seems this
is a bug in either mesa or the Intel graphics driver, which means many
people have this problem so it's likely to be fixed. There's actually
a merge request on mesa already (2022-12-29).
It could also be that bug because the error message I get is
actually:
Jan 20 12:49:10 angela kernel: Asynchronous wait on fence 0000:00:02.0:sway[104431]:cb0ae timed out (hint:intel_atomic_commit_ready [i915])
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
It's a solid 30 seconds graphical hang. Maybe the keyboard and
everything else keeps working. The latter bug report is quite long,
with many comments, but this one from January 2023 seems to say
that Sway 1.8 fixed the problem. There's also an earlier patch to
add an extra kernel parameter that supposedly fixes that too. There's
all sorts of other workarounds in there, for example this:
from this comment... So that one is unsolved, as far as the
upstream drivers are concerned, but maybe could be fixed through Sway.
Weird USB hangs / graphical glitches
I have had weird connectivity glitches better described in this
post, but basically: my USB keyboard and mice (connected over a
USB hub) drop keys, lag a lot or hang, and I get visual glitches.
The fix was to tighten the screws around the CPU on the motherboard
(!), which is, thankfully, a rather simple repair.
USB docks are hell
Note that the monitors are hooked up to angela through a USB-C /
Thunderbolt dock from Cable Matters, with the lovely name of
201053-SIL. It has issues, see this blog
post for an in-depth discussion.
Shipping details
I ordered the Framework in August 2022 and received it about a month
later, which is sooner than expected because the August batch was
late.
People (including me) expected this to have an impact on the September
batch, but it seems Framework have been able to fix the delivery
problems and keep up with the demand.
As of early 2023, their website announces that laptops ship "within 5
days". I have myself ordered a few expansion cards in November 2022,
and they shipped on the same day, arriving 3-4 days later.
The supply pipeline
There are basically 6 steps in the Framework shipping pipeline, each
(except the last) accompanied with an email notification:
pre-order
preparing batch
preparing order
payment complete
shipping
(received)
This comes from the crowdsourced spreadsheet, which should be
updated when the status changes here.
I was part of the "third batch" of the 12th generation laptop, which
was supposed to ship in September. It ended up arriving on my door
step on September 27th, about 33 days after ordering.
It seems current orders are not processed in "batches", but in real
time, see this blog post for details on shipping.
Shipping trivia
I don't know about the others, but my laptop shipped through no less
than four different airplane flights. Here are the hops it took:
I can't quite figure out how to calculate exactly how much mileage
that is, but it's huge. The ride through Alaska is surprising enough
but the bounce back through Winnipeg is especially weird. I guess
the route happens that way because of Fedex shipping hubs.
There was a related oddity when I had my Purism laptop shipped: it
left from the west coast and seemed to enter on an endless, two week
long road trip across the continental US.
Akvorado collects sFlow and IPFIX flows, stores them in a
ClickHouse database, and presents them in a web console. Although it lacks
built-in DDoS detection, it s possible to create one by crafting custom
ClickHouse queries.
DDoS detection
Let s assume we want to detect DDoS targeting our customers. As an example, we
consider a DDoS attack as a collection of flows over one minute targeting a
single customer IP address, from a single source port and matching one
of these conditions:
an average bandwidth of 1 Gbps,
an average bandwidth of 200 Mbps when the protocol is UDP,
more than 20 source IP addresses and an average bandwidth of 100 Mbps, or
more than 10 source countries and an average bandwidth of 100 Mbps.
Here is the SQL query to detect such attacks over the last 5 minutes:
DDoS remediation
Once detected, there are at least two ways to stop the attack at the network
level:
blackhole the traffic to the targeted user (RTBH), or
selectively drop packets matching the attack patterns (Flowspec).
Traffic blackhole
The easiest method is to sacrifice the attacked user. While this helps the
attacker, this protects your network. It is a method supported by all routers.
You can also offload this protection to many transit providers. This is useful
if the attack volume exceeds your internet capacity.
This works by advertising with BGP a route to the attacked user with a specific
community. The border router modifies the next hop address of these routes to a
specific IP address configured to forward the traffic to a null interface. RFC 7999 defines 65535:666 for this purpose. This is known as a
remote-triggered blackhole (RTBH) and is explained in more detail in RFC 3882.
It is also possible to blackhole the source of the attacks by leveraging
unicast Reverse Path Forwarding (uRPF) from RFC 3704, as explained in RFC 5635. However, uRPF can be a serious tax on your router resources. See
NCS5500 uRPF: Configuration and Impact on Scale for an example of the kind
of restrictions you have to expect when enabling uRPF.
On the advertising side, we can use BIRD. Here is a complete configuration
file to allow any router to collect them:
log stderr all;
router id 192.0.2.1;protocol device
scan time 10;protocol bgp exporter
ipv4
import none;
export where proto = "blackhole4"; ;
ipv6
import none;
export where proto = "blackhole6"; ;
local as 64666; neighbor range 192.0.2.0/24 external;
multihop;
dynamic name "exporter";
dynamic name digits 2;
graceful restart yes;
graceful restart time 0; long lived graceful restart yes;
long lived stale time 3600;# keep routes for 1 hour!protocol static blackhole4
ipv4;
route 203.0.113.206/32 blackhole bgp_community.add((65535, 666));
; route 203.0.113.68/32 blackhole bgp_community.add((65535, 666));
;protocol static blackhole6
ipv6;
We use BGP long-lived graceful restart to ensure routes are kept for
one hour, even if the BGP connection goes down, notably during maintenance.
On the receiver side, if you have a Cisco router running IOS XR, you can use the
following configuration to blackhole traffic received on the BGP session. As the
BGP session is dedicated to this usage, The community is not used, but you can
also forward these routes to your transit providers.
router static
vrf public
address-family ipv4 unicast
192.0.2.1/32 Null0 description "BGP blackhole" !address-family ipv6 unicast
2001:db8::1/128 Null0 description "BGP blackhole" ! !!route-policy blackhole_ipv4_in_public
if destination in (0.0.0.0/0 le 31) then
dropendifset next-hop 192.0.2.1doneend-policy!route-policy blackhole_ipv6_in_public
if destination in (::/0 le 127) then
dropendifset next-hop 2001:db8::1doneend-policy!router bgp 12322neighbor-group BLACKHOLE_IPV4_PUBLIC
remote-as64666ebgp-multihop255update-source Loopback10
address-family ipv4 unicast
maximum-prefix10090route-policy blackhole_ipv4_in_public in
route-policy drop out
long-lived-graceful-restart stale-time send 86400 accept 86400 !address-family ipv6 unicast
maximum-prefix10090route-policy blackhole_ipv6_in_public in
route-policy drop out
long-lived-graceful-restart stale-time send 86400 accept 86400 ! !vrf public
neighbor192.0.2.1use neighbor-group BLACKHOLE_IPV4_PUBLIC
description akvorado-1
When the traffic is blackholed, it is still reported by IPFIX and sFlow.
In Akvorado, use ForwardingStatus >= 128 as a filter.
While this method is compatible with all routers, it makes the attack successful
as the target is completely unreachable. If your router supports it, Flowspec
can selectively filter flows to stop the attack without impacting the
customer.
FlowspecFlowspec is defined in RFC 8955 and enables the transmission of flow
specifications in BGP sessions. A flow specification is a set of matching
criteria to apply to IP traffic. These criteria include the source and
destination prefix, the IP protocol, the source and destination port, and the
packet length. Each flow specification is associated with an action, encoded as an
extended community: traffic shaping, traffic marking, or redirection.
To announce flow specifications with BIRD, we extend our configuration. The
extended community used shapes the matching traffic to 0 bytes per second.
flow4 table flowtab4;
flow6 table flowtab6;
protocol bgp exporter
flow4
import none;
export where proto = "flowspec4"; ;
flow6
import none;
export where proto = "flowspec6"; ;# [ ]protocol static flowspec4
flow4;
route flow4
dst 203.0.113.68/32;
sport = 53;
length >= 1476 && <= 1500;
proto = 17;
bgp_ext_community.add((generic, 0x80060000, 0x00000000));
;
route flow4
dst 203.0.113.206/32;
sport = 123;
length = 468;
proto = 17;
bgp_ext_community.add((generic, 0x80060000, 0x00000000));
;protocol static flowspec6
flow6;
If you have a Cisco router running IOS XR, the configuration may look like
this:
vrf public
address-family ipv4 flowspec
address-family ipv6 flowspec
!router bgp 12322address-family vpnv4 flowspec
address-family vpnv6 flowspec
neighbor-group FLOWSPEC_IPV4_PUBLIC
remote-as64666ebgp-multihop255update-source Loopback10
address-family ipv4 flowspec
long-lived-graceful-restart stale-time send 86400 accept 86400route-policy accept in
route-policy drop out
maximum-prefix10090validation disable
!address-family ipv6 flowspec
long-lived-graceful-restart stale-time send 86400 accept 86400route-policy accept in
route-policy drop out
maximum-prefix10090validation disable
! !vrf public
address-family ipv4 flowspec
address-family ipv6 flowspec
neighbor192.0.2.1use neighbor-group FLOWSPEC_IPV4_PUBLIC
description akvorado-1
Then, you need to enable Flowspec on all interfaces with:
As with the RTBH setup, you can filter dropped flows with ForwardingStatus >=
128.
DDoS detection (continued)
In the example using Flowspec, the flows were also filtered on the length of the packet:
route flow4
dst 203.0.113.68/32;
sport = 53;
length >= 1476 && <= 1500;
proto = 17;
bgp_ext_community.add((generic, 0x80060000, 0x00000000));
;
This is an important addition: legitimate DNS requests are smaller than this and
therefore not filtered.2 With ClickHouse, you can get the 10th
and 90th percentiles of the packet sizes with quantiles(0.1,
0.9)(Bytes/Packets).
The last issue we need to tackle is how to optimize the request: it may need
several seconds to collect the data and it is likely to consume substantial
resources from your ClickHouse database. One solution is to create a
materialized view to pre-aggregate results:
The ddos_logs table is using the SummingMergeTree engine. When the table
receives new data, ClickHouse replaces all the rows with the same sorting key,
as defined by the ORDER BY directive, with one row which contains summarized
values using either the sum() function or the explicitly specified aggregate
function (uniqCombined and quantiles in our example).3
Finally, we can modify our initial query with the following one:
Gluing everything together
To sum up, building an anti-DDoS system requires to following these steps:
define a set of criteria to detect a DDoS attack,
translate these criteria into SQL requests,
pre-aggregate flows into SummingMergeTree tables,
query and transform the results to a BIRD configuration file, and
configure your routers to pull the routes from BIRD.
A Python script like the following one can handle the fourth step. For each
attacked target, it generates both a Flowspec rule and a blackhole route.
importsocketimporttypesfromclickhouse_driverimportClientasCHClient# Put your SQL query here!SQL_QUERY=" "# How many anti-DDoS rules we want at the same time?MAX_DDOS_RULES=20defempty_ruleset():ruleset=types.SimpleNamespace()ruleset.flowspec=types.SimpleNamespace()ruleset.blackhole=types.SimpleNamespace()ruleset.flowspec.v4=[]ruleset.flowspec.v6=[]ruleset.blackhole.v4=[]ruleset.blackhole.v6=[]returnrulesetcurrent_ruleset=empty_ruleset()client=CHClient(host="clickhouse.akvorado.net")whileTrue:results=client.execute(SQL_QUERY)seen=new_ruleset=empty_ruleset()for(t,addr,proto,port,gbps,mpps,sources,countries,size)inresults:if(addr,proto,port)inseen:continueseen[(addr,proto,port)]=True# Flowspecifaddr.ipv4_mapped:address=addr.ipv4_mappedrules=new_ruleset.flowspec.v4table="flow4"mask=32nh="proto"else:address=addrrules=new_ruleset.flowspec.v6table="flow6"mask=128nh="next header"ifsize[0]==size[1]:length=f"length = int(size[0])"else:length=f"length >= int(size[0]) && <= int(size[1])"header=f"""# Time: t# Source: address, protocol: proto, port: port# Gbps/Mpps: gbps:.3/mpps:.3, packet size: int(size[0])<=X<=int(size[1])# Flows: flows, sources: sources, countries: countries"""rules.append(f"""headerroute table dst address/mask; sport = port;length;nh = socket.getprotobyname(proto); bgp_ext_community.add((generic, 0x80060000, 0x00000000));;""")# Blackholeifaddr.ipv4_mapped:rules=new_ruleset.blackhole.v4else:rules=new_ruleset.blackhole.v6rules.append(f"""headerroute address/mask blackhole bgp_community.add((65535, 666));;""")new_ruleset.flowspec.v4=list(set(new_ruleset.flowspec.v4[:MAX_DDOS_RULES]))new_ruleset.flowspec.v6=list(set(new_ruleset.flowspec.v6[:MAX_DDOS_RULES]))# TODO: advertise changes by mail, chat, ...current_ruleset=new_rulesetchanges=Falseforrules,pathin((current_ruleset.flowspec.v4,"v4-flowspec"),(current_ruleset.flowspec.v6,"v6-flowspec"),(current_ruleset.blackhole.v4,"v4-blackhole"),(current_ruleset.blackhole.v6,"v6-blackhole"),):path=os.path.join("/etc/bird/",f"path.conf")withopen(f"path.tmp","w")asf:forrinrules:f.write(r)changes=(changesornotos.path.exists(path)ornotsamefile(path,f"path.tmp"))os.rename(f"path.tmp",path)ifnotchanges:continueproc=subprocess.Popen(["birdc","configure"],stdin=subprocess.DEVNULL,stdout=subprocess.PIPE,stderr=subprocess.PIPE,)stdout,stderr=proc.communicate(None)stdout=stdout.decode("utf-8","replace")stderr=stderr.decode("utf-8","replace")ifproc.returncode!=0:logger.error(" error:\n\n".format("birdc reconfigure","\n".join([" O: ".format(line)forlineinstdout.rstrip().split("\n")]),"\n".join([" E: ".format(line)forlineinstderr.rstrip().split("\n")]),))
Until Akvorado integrates DDoS detection and mitigation, the ideas presented
in this blog post provide a solid foundation to get started with your own
anti-DDoS system.
ClickHouse can export results using Markdown format when
appending FORMAT Markdown to the query.
While most DNS clients should retry with TCP on failures, this is not
always the case: until recently, musl libc did not implement this.
The materialized view also aggregates the data at hand, both
for efficiency and to ensure we work with the right data types.
My toilet is equipped with a Geberit Sigma 70 flush plate. The sales pitch
for this hydraulic-assisted device praises the
ingenious mount that acts like a rocker switch. In practice, the flush is very
capricious and has a very high failure rate. Avoid this type of
mechanism! Prefer a fully mechanical version like the Geberit Sigma 20.
After several plumbers, exchanges with Geberit s technical department, and the
expensive replacement of the entire mechanism, I was still getting a failure rate
of over 50% for the small flush. I finally managed to decrease this rate to 5%
by applying two 8 mm silicone bumpers on the back of the plate. Their
locations are indicated by red circles on the picture below:
Expect to pay about 5 and as many minutes for this operation.